La conférence parisienne autour de la Data
DataXDay est une conférence technique autour du monde de la data, destinée aux passionnés et professionnels.
Big Data, Smart Data, traitements en batch ou en temps-réel, machine learning ou deep learning, etc. sont autant de sujets qui nous passionnent.
Vous souhaitez apprendre, partager et échanger autour de ces sujets Data, venez rencontrer des experts de renommée internationale qui travaillent sur l’exploitation des données et les technologies associées.
Les thématiques du DataXDay
Le futur de la Data Science
Convaincues par l’apport de la Data Science, de nombreuses entreprises ont commencé son
intégration dans leurs traitements. Le défi se trouve désormais autour de la maintenance, du
déploiement et de l’amélioration continue. Quel est l’avenir pour la Data Science, en termes de méthodologies et de mise en production ?
Streaming : la donnée en temps réel
En plus du besoin de gérer de grandes quantités de données, la demande autour de l’analyse en temps réel augmente. La gestion de ces “fast data” est maintenant au cœur du data engineering.
Quelles sont les technologies à utiliser et les patterns à appliquer pour concevoir des pipelines efficaces et stables ?
Les essentiels d’une architecture Big Data
Il n’y a pas si longtemps, le principal reproche fait aux technologies Big Data était leur manque de maturité. Ces technologies ont grandement évoluées depuis et sont désormais plus stables et plus robustes. Quel est l’état de l’art aujourd’hui ?
Programme
Welcoming and Breakfast
Opening
Thanks to machine learning and AI, applications are now being created that can see, hear, and understand the world around them. Learn how you can easily infuse AI into your business today. In addition to a guided walkthrough of easy-to-use machine learning APIs from Google Cloud: Cloud Vision, Cloud Video Intelligence, Cloud Speech, Cloud Natural Language, and Cloud Translation, we'll demonstrate how Google Cloud AutoML enables developers with limited machine learning expertise to train high quality models by leveraging Google’s state of the art transfer learning, and Neural Architecture Search technology.
The Kafka ecosystem goes way beyond the brokers: Kafka Connect, Kafka Stream and KSQL are amazing tools!
I propose to walk you through the implementation of all these components with a focus on streaming and monitoring.
Come Join me to learn how to leverage Kafka to put your data in motion!
Break
Apache NiFi provides a revolutionary data flow management system with a broad range of integrations with existing data production, consumption, and analysis ecosystems, with robust data delivery and provenance infrastructure. This talk will mainly focus on how to deal with workflows lifecycle.
Transforming pictures into memories - PHOTOBOX (FR)
Photobox business is about pictures and derived products: we process 2 to 6 millions photos on a daily basis. To suggest adapted products to our customers we need to handle and better understand the content of their pictures.
Since the number of personal photos has greatly increased thanks to the development of digital cameras and smartphones, scalability is a must.
The goal of this presentation is to introduce our large scale automatic photo labelling pipeline.
Lunch Break
Ever been stuck in a data science use case where any approach seems too hard? Graph theory, describing a system just in terms of nodes and links, could be your answer! In the practical example we’ll show, we’ll try to find data science communities and their leaders in LinkedIn. Challenge accepted?
The internals of query execution in Spark SQL (EN)
If you want to get even slightly better performance of your structured queries (regardless whether they are batch or streaming) you have to peek at the foundations of Dataset API starting with QueryExecution. That's where any query ends up at and my talk starts. The talk will show you what stages a structured query has to go through before execution in Spark SQL. I'll be talking about the different phases of query execution and the logical and physical optimizations. In the end, I'll do a live coding session to show the steps to write logical and physical optimizations in Scala.
Break
This talk will cover how we redesigned our analytics API from the ground up to serve metrics in near real time from billions of events per day. We'll go through the tools we considered for the job to how we actually implemented our solution, starting from the datastore up to the whole data pipeline and its API, leveraging Golang, Kubernetes, GCP and Citus.
Real-Time Access log analysis - BLABLACAR (EN)
At BlaBlaCar we have built a streaming platform to have fast insights about the usage of our services. I will show you how BlaBlaCar builds an automatic access log streaming analysis to improve the security and gain fine-grained knowledge of the platform usage.
Keynote
Cocktail
Welcoming and Breakfast
Opening
Build, train, and deploy machine learning models at scale
Machine learning often feels a lot harder than it should be to most developers because the process to build and train models, and then deploy them into production is too complicated and too slow.
Amazon SageMaker includes modules that can be used together or independently to build, train, and deploy your machine learning models.
Deep learning for vision into the wild - HEURITECH (EN)
Beyond the AI hype, significant new possibilities in the world of computer vision have arisen in the last few years. However, deploying computer vision solutions still requires expert vision knowledge, business understanding, solid engineering and smart processes. I’ll expose the challenges of computer vision applied to a vertical domain such as fashion, and how we solved them at Heuritech.
The wonders of deep learning: how to leverage it for natural language processing - TENDAM (EN)
In recent years, deep learning (DL) has proven to be a transformative force that has made impressive advances in different fields. In fact, within the area of natural language processing (NLP), deep learning has outperformed many former state of the art approaches, such as in machine translation or named entity recognition (NER). In this talk I will present various deep learning algorithms and architectures for NLP, with examples of how they can be leveraged to real world applications
Break
Tensors in the sky with CloudML - XEBIA (FR)
Out of curiosity, ask the other people in the conference room who has already developped neural networks: you will see a lot of hands up. Then ask them how many of those models run in production: epic fail.
Come and see a solution to train and deploy TensorFlow models in the cloud using Google CloudML.
Data lineage is defined as a data life cycle that includes the data’s origin sand where it moves over time. It has become a crucial component of any data centric company, whether for documentation, regulatory compliance, data quality or business impact assessment. This talk will offer an overview of the different approaches to construct and visualize metadata and data lineage in a Big Data environment.
Lunch Break
Join the journey of a data scientist on the way to industrialization... From notebook to proof of concept, from proof of concept to production, we will cover what happened at Air France. It won’t be golden rules, but a true story. What is exactly industrializing data science? How to package data science models? How to articulate data scientists and data engineers roles? Is continuous integration a wild dream for data scientists? This journey will feed you with key concepts which worked at Air France, and might give you a new light to guide you through your own data science journey.
Enhancing medical student practice with patient-like chatbots - MINISTÈRE DES SOLIDARITÉS ET DE LA SANTÉ (FR)
I tested several platforms for creating chatbots with the objective of simulating a patient coming to the emergency room so that medical students could ask questions to establish a diagnosis.
The major advances in the field of Natural Language Processing and Artificial Intelligence have seen the emergence of chatbot platforms to develop your own agent from a web service.
I will present 4 platforms from major technology companies offering their service in French.
Computer vision : a pragmatic alliance between deep learning and a more ``traditional`` technique. - XEBIA (FR)
During DataXDay, you'll hear a lot about machine learning and deep learning. But sometimes, combining those advanced techniques with a more ``traditional`` approach can enhance results in a spectacular way. See how, a data scientist and a software engineer, we managed to build an identity card recognition API.
Break
Millions of people, objects and ‘things’ connecting with each other is changing the way organisations and consumers interact with each other and the environment around them. Data comes from different geographical locations and across multiple channels. Managing this explosion of high velocity dynamic data while maintaining customer privacy is a challenge with legacy systems.
Visualizing algorithms - TOUCAN-TOCO (EN)
Data Scientist, just like their ancestors, Statisticians and Computer Scientists work on notoriously complex subjects with advanced methods... yet their expertise and their practices have a growing impact of everyone lifes. We aim to demonstrate that data storytelling, its concepts and tools are key to the future of data science because of it's power to tell about complex data insights to everyone.
Keynote
Cocktail
Subject to modifications
Speakers
Les conférences seront présentées en anglais ou en français. (Informations disponibles prochainement sur le programme)
Charles Ollion - Heuritech
Deep learning for vision into the wildCo-founder - Heuritech
Charles Ollion, CoFounder @Heuritech, startup specialized in Deep Learning and Computer Vision. Charles Ollion is a PhD in machine learning and teaches deep learning at Ecole Polytechnique / EPITA
Charles Ollion - Heuritech
Deep learning for vision into the wildSylvain Friquet - Algolia
Building a Real Time Analytics API at ScaleSoftware Engineer - Algolia
Sylvain is a software engineer passionate about large scale infrastructures. He is currently working on the Analytics feature of Algolia. Previously, he was CTO for a biotech startup and a software engineer at Facebook, where he worked on graph search and ads product like Slideshow Ads.
Sylvain Friquet - Algolia
Building a Real Time Analytics API at ScaleVincent Poncet - DataStax
How to get real-time value from your IoT data?Solutions Engineer, DataStax
Vincent a plus de 13 ans d'expérience dans l'IT. Il a travaillé sur les technologies SOA, ESB, MDM et plus récemment dans le Big Data, de Hadoop à Cassandra. Il a débuté sa carrière comme consultant et est devenu un avant vente de logiciels. Depuis 2 ans, Vincent aide les clients de DataStax à adopter les technologies NoSQL et BigData.
Vincent Poncet - DataStax
How to get real-time value from your IoT data?Florent Ramière - Confluent
Kafka beyond the brokers: Stream processing and MonitoringTechnical Account Manager, Confluent
For 20 years Florent Ramière has been creating complex to build, easy to use software. He is now a Technical Account Manager at Confluent where he helps his customers succeed with Kafka. Coding is his passion. Understanding what the users want is for him a fun game. What matters to him is getting ambitious things done in a challenging technical and human environment. When he is not in front of a computer he likes to go for long runs in the forest.
Florent Ramière - Confluent
Kafka beyond the brokers: Stream processing and MonitoringAlberto Guggiola - Quantmetry
Exploring graphs: looking for communities & leadersData Scientist - Quantmetry
Alberto earned a PhD in theoretical physics for his work on rare events taking place on graphs, and since then he tries to convince anybody (clients, colleagues, relatives, people on the street) of the added value of this approach.
Alberto Guggiola - Quantmetry
Exploring graphs: looking for communities & leadersAdrien Morvan - Photobox
Transforming pictures into memoriesMachine learning engineer - Photobox
Adrien is a ML engineer at Photobox. He worked on different subjects of computer vision like simultaneous localisation and mapping and face recognition. He now focuses on topics like recommendation to deliver automated solutions at scale.
Adrien Morvan - Photobox
Transforming pictures into memoriesJacek Laskowski
The internals of query execution in Spark SQLSoftware Developer
Jacek is an independent consultant, software developer and technical instructor specializing in Apache Spark, Apache Kafka and Kafka Streams (with Scala, sbt, Kubernetes, DC/OS, Apache Mesos, and Hadoop YARN). He offers software development and consultancy services with very hands-on in-depth workshops and mentoring.
Jacek Laskowski
The internals of query execution in Spark SQLSamya Barkaoui - Toucan-Toco
Visualizing algorithmsHead of data, Toucan-Toco
``Samya achieved 6 years of experience in the data. She started by working in a consulting company specialized in datascience. She studied at the French engineering school 'Ecole des Mines de Paris' and specialized herself in statistics. She supervises statistics projects with ENSAE students. ``
Samya Barkaoui - Toucan-Toco
Visualizing algorithmsThomas Lamirault - BlaBlaCar
Real-Time Access log analysisSoftware Architect - BlaBlaCar
Software Architect Data at BlaBlaCar, he has been in the IT industry for 11 years. Other than being a passionate Java developer, he worked as a data engineer for Ericsson and Bouygue Telecom. He now brings his passion and experience for Flink and Beam to build the next data platform at BlaBlaCar.
Thomas Lamirault - BlaBlaCar
Real-Time Access log analysisPauline Ballereau - Air France
A data scientist journey to industrialization of machine learningData Scientist, Air France
Pauline is a data scientist at Air France-KLM. She is crurently working on recommender systems and digital analytics projects. She holds a MS degree in data science and operations research
Pauline Ballereau - Air France
A data scientist journey to industrialization of machine learningMatthieu Blanc - Zeenea
Data lineage: visualize the data life cycleVP Product - Zeenea
Matthieu has a data architect background. He has co-founded Zeenea in 2017. The company edites a data catalog connected to the Big Data systems. It centralizes all these data and metadata to provide a self-service and collaborative data solution. Matthieu is the VP Product of Zeenea.
Matthieu Blanc - Zeenea
Data lineage: visualize the data life cycleAurélia Nègre - Quantmetry
Exploring graphs: looking for communities & leadersData Scientist - Quantmetry
Aurélia Nègre has a background in statistics, and prior to working at Quantmetry, she worked at the French Central Bank where she designed and implemented credit risk models for structured products.
Aurélia Nègre - Quantmetry
Exploring graphs: looking for communities & leadersAna Peleteiro Ramallo - Tendam
The wonders of deep learning: how to leverage it for natural language processingData Science Director, Tendam
I am currently the Data Science Director at Tendam, where I lead the data science initiatives in the company. Prior to that, I was a Senior Data Scientist at Zalando, where I built data-driven products that provided fashion insights using Machine Learning and Deep Learning. I hold a PhD in Artificial Intelligence, and I have 30+ international peer-reviewed publications, as well as having spoken at 10+ international conferences. I am a firm advocate of knowledge sharing, as well as promoting women in tech initiatives.
Ana Peleteiro Ramallo - Tendam
The wonders of deep learning: how to leverage it for natural language processingCristina Oprean - Photobox
Transforming pictures into memoriesMachine learning engineer - Photobox
Cristina is working as R&D engineer in machine learning @Photobox. She earned a PhD focused on handwriting recognition from Telecom ParisTech. Her current work is centered around adapting and applying the state of the art in computer vision and recommendation for Photobox innovative products.
Cristina Oprean - Photobox
Transforming pictures into memoriesPablo Lopez - Xebia
Computer vision: a pragmatic alliance between deep learning and a more ``traditional`` technique.CTO - Xebia
Pablo has a strong knowledge in software developement and is passionate about technology in general. He is always eager to discover new fields of interests. Therefore, pairing with a DataScientist was for him a way to set foot in a new playfield.
Pablo Lopez - Xebia
Computer vision: a pragmatic alliance between deep learning and a more ``traditional`` technique.Pierre Schmidt - Toucan-Toco
Visualizing algorithmsLead backend developer, Toucan-Toco
``Pierre has 10 year experience working as a data engineer. Pierre started working in the field of industrial manufacturing automation software in London. He later caught up, in Paris, with data engineering and distributed systems for web applications at Sen.se and Deezer. He studied at Ecole Normale Supérieure where he focused on logic and its applications in computer science.``
Pierre Schmidt - Toucan-Toco
Visualizing algorithmsKevin Nelson - Google
A crash course on Google Cloud AutoML and machine learning APIsGoogle Cloud - Developer Advocate, Google
He is a Google Cloud Developer Advocate focused on storage and machine learning. Before joining the Cloud team, he was a lead Product Manager on Google Drive. Prior to joining Google in 2014, he was an entrepreneur with over 20 years of experience building and managing software and SAAS companies. In addition to working at Google, he sits on the board of Quantum Scientific Imaging, a company he co-founded which designs and manufactures scientific CCD cameras for applications that require superior image performance such as astronomical and medical imaging.
Kevin Nelson - Google
A crash course on Google Cloud AutoML and machine learning APIsNicolas Laille - Xebia
A data scientist journey to industrialization of machine learningData Engineer, Xebia
I started as a back end developer before diving deep down in big data as a data Engineer. I am now helping Air France to industrialize their on data sciences projects.
Nicolas Laille - Xebia
A data scientist journey to industrialization of machine learningPierre Villard - Hortonworks
How to deal with workflows lifecycle in Apache NiFi?Solution Architect - Hortonworks
Involved in the Apache NiFi community since 2015, he is a committer and PMC member of the Apache NiFi project. Sincelery convinced by the open source software model, he is also a Solution Architect at Hortonworks since 2016.
Pierre Villard - Hortonworks
How to deal with workflows lifecycle in Apache NiFi?Sylvain Lequeux - Xebia
Tensors in the sky with CloudMLData Engineer - Xebia
Sylvain is Data Engineer at Xebia. He dispenses Cloudera Administrator and Machine Learning with Spark trainings. He is certified Cloudera Developper and is a Software Craftsmanship enthousiast.
Sylvain Lequeux - Xebia
Tensors in the sky with CloudMLOlivier Bergeret - AWS
Machine learning models at scale with Amazon SageMakerSolutions Architect Manager, AWS
Solutions Architect Manager and Data/AI specialist at AWS, Olivier is the creator of two 1-day discovery workshops dedicated to AI and Big Data, and the author of some contents for SageMaker, the managed platform that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale. Before joining AWS, Olivier was the CTO of lacentrale.fr and the Dev Manager and previously a developer for many startups and medium media companies during the past 18 years.
Olivier Bergeret - AWS
Machine learning models at scale with Amazon SageMakerSamah Ghalloussi - Ministère des solidarités et de la santé
Enhancing medical student practice with patient-like chatbotsData scientist / Entrepreneure, Ministère des solidarités et de la santé
Samah worked for 3 years at the French Atomic Energy Commission (CEA) in the Natural Language Processing Lab where she contributed to several Machine Learning projects as well as the creation of chatbots. Then, she joined the start-up Stryng Messaging Inc. in June 2017 to add Artificial Intelligence to this new messaging app dedicated to professionals. She is now data scientist for the French Ministry of Health as a Public Interest Entrepreneur
Samah Ghalloussi - Ministère des solidarités et de la santé
Enhancing medical student practice with patient-like chatbotsPierre Sendorek - Xebia
Computer vision: a pragmatic alliance between deep learning and a more ``traditional`` technique.Data Scientist - Xebia
Pierre Sendorek is passionate about machine learning and signal processing. He holds a PhD in signal processing and a master in applied mathematics as well as an engineering diploma. Currently, he is working as a Data Scientist at Xebia.
Pierre Sendorek - Xebia
Computer vision: a pragmatic alliance between deep learning and a more ``traditional`` technique.Speaker bientôt annoncé..
Data LoverSpeaker bientôt annoncé..
Data LoverContact et accès
La conférence se tiendra au Pan Piper, dans le 11ème arrondissement de Paris, à quelques pas de la station Philippe Auguste (ligne 2) ou à 5 minutes à pied de la station Charonne (ligne 9).