Continued Teaching

Now I preparing to teach a student about Data Warehouses. These are technologies that aggregate data from multiple sources so they can be compared and analyzed for various purposes. They can:

Hold data for a long period of time
optimize operations for reading data
hold data for long periods of time
Hold data that may lag and not be updated in real-time

There are many types of data warehouses like Vertica, Teradata, Oracle, and IBM. There is Apache Hive, a new open-source warehouse and the main one my student and I will be going over for this session. It part of the larger Hadoop ecosystem.

*Hadoop: distributed computing framework for processing millions of records. The process for Hadoop goes like this:

Store millions of records in multiple machines
Run processes on multiple machines to crunch data
Handle fault tolerance/machine crashes
Hive stores data in Hadoop process (data stored in files - text, binary) and partitioned across machines to prevent data loss

March 2, 2020

Continued Teaching

Written by tyler775