Data Management

Baby elephant with mother
The Big in Big Data is Relative.
Data Management

Data management within Business Intelligence and Big Data is more than just the gathering of data into the Big Data environment. It is about integrating this capability into an organization's current IT ecosystem. No one piece of software or data product or tool will provide all the answers. It is about using the existing infrastructure efficiently and elegantly to provide both historical and current perspectives. It is the collection of both the unstructured data and the archival data from those databases in our Big Data environment along with the online databases that work together to give a data driven picture of the enterprise. There is no one data warehouse, but a collection of data stores coming together to create a logical or virtual data warehouse.

When we mention unstructured data we are referencing machine data such as logs and text files from computers, routers and firewalls. Included in this are also emails and sensor data from shop floor equipment and well as monitors that exist through out the enterprise. It is important at this stage not to concern ourselves too much with what data to include. Our primary goal is to include everything we can get our hands onto. As we approach limitations in storage we'll then make decisions about prioritizing the data collection and the timeframes we'll have available for our queries. Sometimes it is better to limit the timeframe of collected data than the variety of data.

Structured data comes from databases such as SQL databases from Oracle, Microsoft SQL Server, IBM DB2 and MySQL. This can be data from other collected sources such as Cognos and Teradata. Older structured should be collected into our Big Data environment. The more current or recent structured data is available to our queries from the online databases.