Before developing a course, we listen to the real needs and objectives of each client, to adjust training and get high profitability. We adjust each course to your needs.
We are also specialists in formations 'in company' tailored to the needs of each organization, where harvesting for several participants from the same company is much higher. If this is your case, contact us.
Big Data Introduction
Big Data Introduction
This course is designed to introduce and explain the main concepts and technologies Big Data field.
The aim of this course is to provide a holistic view of Big Data, relying on their ability to generate business opportunities and optimize existing ones.
Examples of architectures already implemented in the market and use cases in which it will analyze Big Data They will be and has been decisive.
Big Data (or handling large volumes of information) are datasets that grow so large that they become awkward to work with management tools traditional databases. Difficulties include capture, storage, search, sharing, analysis, and visualization.
If this trend continues, due to the benefits of working with larger sets of data that allow analysts to "spot business trends, prevent diseases, combat crime" will need new technologies, NoSQL, Hadoop ... that support.
Big Data heterogeneous technologies are used, but complementary to achieve these goals (Hadoop, NoSQL, Column oriented DB, SQL Databases ...), along with powerful visualization tools, also open source.
Updated collection of information that we published about Big Data
In Stratebi (contact us for a presentation), we use the power of business intelligence with the Big Data to provide Big Data Analytics solutions in areas such as:
- Solutions Social Media (twitter, facebook ....)
- Solutions web analytics analysis, logs, ecommerce, etc ...
- Smart City Solutions / Open Data
- Geolocation solutions, mobile devices ....
- Solutions for fraud detection, auditing, performance level systems
- Security solutions and financial analysis for Retail, Telco, Banking and Insurance
- Advanced Solutions customer segmentation, leads and automation of commercial activities
- Business Intelligence (reports, dashboards, OLAP ...) in real time
- Analysis solutions for utilities and sensors (energy, water, pollution, light ....)
- Detection solutions buying patterns, recommendations, etc ...
- Joint solutions unstructured and structured bulk loading and analysis of information not previously made
Thanks to Open Source technologies, implementation of these solutions represent a huge cost savings over proprietary solutions. Ej), Big Data Analytics with Pentaho
1. Introduction to Big Data
- Main guidelines that Big Data is based.
- Historical overview and introduction to the context of Big Data through intuitive examples.
- How Big Data affects businesses.
- The relationship between Big Data, Business Intelligence & Data Science
2. Big Data Architectures
- Introduction and classification to different architectures and Big Data systems available on the market
- Depth study Hadoop environment: HDFS, MapReduce, YARN, stack analysis tools available on HDFS and MapReduce (Hive, Pig ...), introduction to Hadoop distributions, etc.
- Study of the main NoSQL solutions: Cassandra, MongoDB, ...
- Introduction to analytical databases: HPVertica and MonetDB
- Considerations for choosing an architecture Big Data
- Practical examples and future vision on these databases
- Installing a Hadoop distribution of a single node for testing
- Introduction to Hadoop cluster management
- Introduction to the HDFS file system
3. Obtaining and data movement in Big Data
- Study of the main types of existing data sources
- Structured, semi-structured and unstructured data
- Batch and streaming
- Analysis of the main tools available for acquisition and data movement:
- Pentaho Data Integration: Loading, processing and data extraction of any kind from sources of data to HDFS and vice versa.
- Sqoop: Loading and relational data extraction (SGBDR-> HDFS, HDFS-> RDBMS) in batch
- Flume: Loading and processing real-time data
- Exercises with the above tools based on a case study to obtain data logs, social networks ...
4. Processing of Big Data
- Analysis of temporal requirements analysis (opportunity analysis)
- Introduction to the main tools for processing and analysis of Big Data
- Tools on MapReduce : Pig, Hive
- Tools that do not use MapReduce : Spark, Spark Streaming, Storm ...
- Exercise based on a case study for processing log data, social networks ...
5. Case Studies
- Case analysis of market research : Amazon recommendation system , analysis of sensor data transport companies, analysis of clicks on web pages ...
- Analysis of case studies based on our extensive experience in the development of Big Data projects