Technical specialist for Data Scientist

Before organising a course or seminar, we listen to the real needs and objectives of each client, in order to adapt the training and get the most out of it. We tailor each course to your needs.

We are also specialists in 'in company' trainings adapted to the needs of each organisation, where the benefit for several attendees from the same company is much greater. If this is your case, contact us.

Calls for course

Ponemos a disposición también plataforma Cloud con todas las herramientas instaladas y configuradas, listas para la formación, incluyendo ejercicios, bases de datos, etc... para no perder tiempo en la preparación y configuración inicial. ¡Sólo preocuparos de aprender!

Ofrecemos también la posibilidad de realizar formaciones en base a ‘Casos de Uso’

Se complementa la formación tradicional de un temario/horas/profesor con la realización de casos prácticos en las semanas posteriores al curso en base a datos reales de la propia organización, de forma que se puedan ir poniendo en producción proyectos iniciales con nuestro soporte, apoyo al desarrollo y revisión con los alumnos y equipos, etc…

En los 10 últimos años, ¡hemos formado a más de 250 organizaciones y 3.000 alumnos!

Ah, y regalamos nuestras famosas camisetas de Data Ninjas a todos los asistentes. No te quedes si las tuyas

Goal

This course is designed to train highly qualified specialists in the field Big Data technologies.

The aim of this course is to provide a detailed and practical view from a technical point of view of Big Data, and practice putting studied the different techniques and technologies.

Examples of architectures already implemented in the market and use cases where big data is and has been instrumental, through practical exercises will be discussed.

Big Data (or handling large volumes of information) are datasets that grow so large that they become awkward to work with management tools traditional databases. Difficulties include capture, storage, search, sharing, analysis, and visualization.

If this trend continues, due to the benefits of working with larger sets of data that allow analysts to "spot business trends, prevent diseases, combat crime" will need new technologies, NoSQL, Hadoop ... that support.

Big Data heterogeneous technologies are used, but complementary to achieve these goals (Hadoop, NoSQL, Column oriented DB, SQL Databases ...), along with powerful visualization tools, also open source.

Target audiences

Aimed at engineers with previous knowledge in the field of data analysis , statistics, etc.

Observations

Updated collection of information that we published about Big Data

In Stratebi (contact us for a presentation), we use the power of business intelligence with the Big Data to provide Big Data Analytics solutions in areas such as:

Solutions Social Media (twitter, facebook ....)
Solutions web analytics analysis, logs, ecommerce, etc ...
Smart City Solutions / Open Data
Geolocation solutions, mobile devices ....
Solutions for fraud detection, auditing, performance level systems
Security solutions and financial analysis for Retail, Telco, Banking and Insurance
Advanced Solutions customer segmentation, leads and automation of commercial activities
Business Intelligence (reports, dashboards, OLAP ...) in real time
Analysis solutions for utilities and sensors (energy, water, pollution, light ....)
Detection solutions buying patterns, recommendations, etc ...
Joint solutions unstructured and structured bulk loading and analysis of information not previously made

Thanks to Open Source technologies, implementation of these solutions represent a huge cost savings over proprietary solutions. Ej), Big Data Analytics with Pentaho

Ver documento

Syllabus

1. Introduction to Big Data

Main guidelines that Big Data is based.
Historical overview and introduction to the context of Big Data through intuitive examples.
How Big Data affects businesses.
The relationship between Big Data, Business Intelligence & Data Science.

2. Big Data Architectures

Introduction and classification to different architectures and Big Data systems available on the market
Depth study Hadoop environment: HDFS, MapReduce, YARN, stack analysis tools available on HDFS and MapReduce (Hive, Pig ...), introduction to Hadoop distributions, etc.
Study of the main NoSQL solutions: Cassandra, MongoDB, ...
Introduction to analytical databases: HPVertica and MonetDB
Considerations for choosing an architecture Big Data
Practical examples and future vision on these databases
- Installing a Hadoop distribution of a single node for testing
- Introduction to Hadoop cluster management.
- Introduction to the HDFS file system
- Installation and initiation to the use of Cassandra
- Installation and initiation to the use of Mongo DB

3. Obtaining and data movement in Big Data

Study of the main types of existing data sources
- Structured, semi-structured and unstructured data
- Batch y streaming
Analysis of the main tools available for acquisition and data movement
- Pentaho Data Integration: Loading, processing and data extraction of any kind from sources of data to HDFS and vice versa.
- Sqoop: Loading and relational data extraction (SGBDR-> HDFS, HDFS-> RDBMS) in batch
- Flume: Loading and processing real-time data
- Kafka distributed queuing system that allows data acquisition in real time
Exercises with the above tools based on a case study to obtain data logs, social networks ...
Exercises for movement and data transformation with Pentaho Data Integration and from Mongo DB

4. Processing of Big Data

Analysis of temporal requirements analysis (opportunity analysis)
Introduction to the main tools for processing and analysis of Big Data
- Tools on MapReduce: Pig, Hive
- Tools that do not use MapReduce: Spark, Spark Streaming, Storm ...
Based on a case study for processing log data, social networking exercises ...
Introduction to Spark and Scala language
- Spark-scale exercises and language on Hadoop
- Analysis of the possibilities of integration with different architectures Big Data: Cassandra, Mongo DB, etc.

5. Introduction to Big Data processing in real time

Analysis of solutions for analyzing real-time data in Hadoop: Spark Streaming and Storm
Integration with other tools of the Hadoop environment

6. Introduction to Machine Learning & Big Data

Introduction to Machine Learning: Definition of Machine Learning and study of the main methods
Machine Learning with Spark: Techniques and tools for application development with Spark Machine Learning and mlib
Exercise for the development of a simple algorithm with Spark Machine Learning and mlib

7. Case Studies

Case analysis of market research: Amazon recommendation system, analysis of sensor data transport companies, analysis of clicks on web pages ...
Analysis of case studies based on our extensive experience in the development of Big Data projects

Announcement