Before developing a course, we listen to the real needs and objectives of each client, to adjust training and get high profitability. We adjust each course to your needs.
We are also specialists in formations 'in company' tailored to the needs of each organization, where harvesting for several participants from the same company is much higher. If this is your case, contact us.
Technical specialist for Data Scientist
Technical specialist for Data Scientist
This course is designed to train highly qualified specialists in the field Big Data technologies.
The aim of this course is to provide a detailed and practical view from a technical point of view of Big Data, and practice putting studied the different techniques and technologies.
Examples of architectures already implemented in the market and use cases where big data is and has been instrumental, through practical exercises will be discussed.
Big Data (or handling large volumes of information) are datasets that grow so large that they become awkward to work with management tools traditional databases. Difficulties include capture, storage, search, sharing, analysis, and visualization.
If this trend continues, due to the benefits of working with larger sets of data that allow analysts to "spot business trends, prevent diseases, combat crime" will need new technologies, NoSQL, Hadoop ... that support.
Big Data heterogeneous technologies are used, but complementary to achieve these goals (Hadoop, NoSQL, Column oriented DB, SQL Databases ...), along with powerful visualization tools, also open source.
Updated collection of information that we published about Big Data
In Stratebi (contact us for a presentation), we use the power of business intelligence with the Big Data to provide Big Data Analytics solutions in areas such as:
- Solutions Social Media (twitter, facebook ....)
- Solutions web analytics analysis, logs, ecommerce, etc ...
- Smart City Solutions / Open Data
- Geolocation solutions, mobile devices ....
- Solutions for fraud detection, auditing, performance level systems
- Security solutions and financial analysis for Retail, Telco, Banking and Insurance
- Advanced Solutions customer segmentation, leads and automation of commercial activities
- Business Intelligence (reports, dashboards, OLAP ...) in real time
- Analysis solutions for utilities and sensors (energy, water, pollution, light ....)
- Detection solutions buying patterns, recommendations, etc ...
- Joint solutions unstructured and structured bulk loading and analysis of information not previously made
Thanks to Open Source technologies, implementation of these solutions represent a huge cost savings over proprietary solutions. Ej), Big Data Analytics with Pentaho
1. Introduction to Big Data
- Main guidelines that Big Data is based.
- Historical overview and introduction to the context of Big Data through intuitive examples.
- How Big Data affects businesses.
- The relationship between Big Data, Business Intelligence & Data Science.
2. Big Data Architectures
- Introduction and classification to different architectures and Big Data systems available on the market
- Depth study Hadoop environment: HDFS, MapReduce, YARN, stack analysis tools available on HDFS and MapReduce (Hive, Pig ...), introduction to Hadoop distributions, etc.
- Study of the main NoSQL solutions: Cassandra, MongoDB, ...
- Introduction to analytical databases: HPVertica and MonetDB
- Considerations for choosing an architecture Big Data
- Practical examples and future vision on these databases
- Installing a Hadoop distribution of a single node for testing
- Introduction to Hadoop cluster management.
- Introduction to the HDFS file system
- Installation and initiation to the use of Cassandra
- Installation and initiation to the use of Mongo DB
3. Obtaining and data movement in Big Data
- Study of the main types of existing data sources
- Structured, semi-structured and unstructured data
- Batch y streaming
- Analysis of the main tools available for acquisition and data movement
- Pentaho Data Integration: Loading, processing and data extraction of any kind from sources of data to HDFS and vice versa.
- Sqoop: Loading and relational data extraction (SGBDR-> HDFS, HDFS-> RDBMS) in batch
- Flume: Loading and processing real-time data
- Kafka distributed queuing system that allows data acquisition in real time
- Exercises with the above tools based on a case study to obtain data logs, social networks ...
- Exercises for movement and data transformation with Pentaho Data Integration and from Mongo DB
4. Processing of Big Data
- Analysis of temporal requirements analysis (opportunity analysis)
- Introduction to the main tools for processing and analysis of Big Data
- Tools on MapReduce: Pig, Hive
- Tools that do not use MapReduce: Spark, Spark Streaming, Storm ...
- Based on a case study for processing log data, social networking exercises ...
- Introduction to Spark and Scala language
- Spark-scale exercises and language on Hadoop
- Analysis of the possibilities of integration with different architectures Big Data: Cassandra, Mongo DB, etc.
5. Introduction to Big Data processing in real time
- Analysis of solutions for analyzing real-time data in Hadoop: Spark Streaming and Storm
- Integration with other tools of the Hadoop environment
6. Introduction to Machine Learning & Big Data
- Introduction to Machine Learning: Definition of Machine Learning and study of the main methods
- Machine Learning with Spark: Techniques and tools for application development with Spark Machine Learning and mlib
- Exercise for the development of a simple algorithm with Spark Machine Learning and mlib
7. Case Studies
- Case analysis of market research: Amazon recommendation system, analysis of sensor data transport companies, analysis of clicks on web pages ...
- Analysis of case studies based on our extensive experience in the development of Big Data projects