If you do a search on the term Big Data, you will invariably fall on Hadoop. Traditional data warehouses and relational databases treat structured data and can store huge amounts of data, but the structuring requirements limit the type of data that can be processed. Hadoop is designed to handle large amounts of data, regardless of structure.
The MapReduce infrastructure that constitutes the core of Hadoop was created by Google in response to the problem of creating search indexes on the Web. MapReduce spreads calculations across multiple nodes, solving the problem of data that is too large to fit on a single machine. Associated with Linux servers, this technique is an economical alternative to very large spreadsheets.
The Hadoop Distributed File System (HDFS), in the case of a server failure, does not interrupt the processing treat by redundant data replication across the cluster. No restrictions are imposed on the data stored in the HDFS system: they can be unstructured and without a schema.
Conversely, relational databases require, prior to any registration, that the data will be structured and the schemas defined. With HDFS, the developer code is responsible for interpreting the data.
No comments:
Post a Comment