Description of the Problem
In any business that processes vast amounts of data, they are likely to face one huge challenge. This problem involves storage and processing. Lack of enough storage and processing mechanisms can cause the company’s manager headaches which come from the fact that the company requires most of the data and erasing them would not be recommendable ( Jarke, Lenzerini, Vassiliou, & Vassiliadis, 2013 ). Existence of too much data and inadequate storage mechanisms can hamper information processing in the machinery used in the business. Therefore, lack of efficient and reliable storage and processing mechanisms, pose a massive problem to companies (Krishna, 2013). The business enterprise has to implement different frameworks of big data solutions to help deal with the issues efficiently for better performance of data warehousing.
Many Companies are faced with risks of inconsistency when they do not have sufficient or reliable storage mechanisms. Moreover, the cost of maintaining the data in their systems is too high ( Vera-Baquero, Colomo-Palacios, & Molloy, 2013 ). On the other hand, their competitors are moving to new technologies which help them retain their data or even move them to new technologically advanced machines ( Lee H., Lee Y., Choi, Chung, & Moon, 2012 ). Once a company has a problem of inadequacy of storage they can opt to move their data to new tools provided by the developed market. This move can help assure the company that all their data is well secured and can easily be accessed. The company can use sources such as files to help them retrieve the data they need from one source to another. This process of collection of data from files to make sure that the company’s information is not lost helps the company maintain its strength in data processing and increase space for storing new data.
Delegate your assignment to our experts and they will do the rest.
List the analyses that are needed to solve the business problem.
There is a list of big data analyses techniques that can be used in solving the storage and processing of data problems. Such elements include classification tree analysis, machine learning, and social network analysis ( Lemke, Mindnich, Weyerhaeuser, Faerber, & Sattler, 2014 ). Association rule learning involves finding out relationships between variables in large databases. In classification tree analysis, documents are categorized to groups where they belong. Genetic algorithms use evolution-based techniques to solve a problem. With machine learning, there is the use of software that use existing data to make predictions based on specific traits and properties (Lee et al., 2012). Regression analysis involves the manipulation of an independent variable and observing how it affects a dependent variable (Krishna, 2013). Sentiment analysis means analyzing what sentiments other people are giving regarding a particular topic and social network analysis involves analysis of relationships between people in different backgrounds and fields.
When a company is deciding on the solutions to adopt, it should consider factors that favor the company before initiating them. An example of such a solution is Hadoop, which is a solution that is well known for solving storage and processing problems in big data-oriented companies. This solution, however, is known to be very sophisticated and a company should have a team of competent engineers who can manage to fix the bug in the warehouse ( Aji, Wang, Vo, Lee, Liu, Zhang, & Saltz, 2013 ). Another solution is to adopt a technology referred to as “spark”. With the use of spark, you can solve a lot of processing problems found in a business. Some of the issues it can solve evolve around live streaming, batch information, graphical data, and machine learning ( Vera-Baquero, Colomo-Palacios, & Molloy, 2013 ). Spark can also boost speed in the machines used in a company and is usually considered as a solution to many big data problems. It is advantageous and allows the company to function efficiently since it boosts the speeds of live streaming, attainment of batch information and machinery learning. Finally, the other solution that can be used is Flint. This mechanism can be used to solve many Internets of Things (IoT) which involves the processing of data in companies.
Identify what data is required to solve the problem, the volume of the data, the velocity of the data, the variety of data.
When choosing the data to solve this problem, the organization should consider all data relating to its business. The volume of the data should include data from various avenues including business’ database, social media networks, digital images, and audio signals ( Pääkkönen, & Pakkala, 2015 ). This data should be able to be processed instantaneously through dealing with data in real time for the smooth running of the usual business activities of the company ( Vera-Baquero, Colomo-Palacios, & Molloy, 2013 ). This data will come in different varieties such as unstructured data which includes text messages, emails, documents, social media posts, photos, and video content; structured data which includes data in spreadsheets, and relational databases such as addresses, date, and numeric information.
Identify the legacy source systems to retrieve the required data. including the formats of the sources.
In retrieving information for big data, the business should consider different legacy sources. For example, the company can obtain information from its database, the internet, and the software ( Nebot & Berlanga, 2012 ). These sources include Relational Database (RDB), Extensible Markup Language (XML) files, non-RDB files, flat files, Portable Document Format (PDF) and web services.
Describe the data warehouse infrastructure to solve your business problem.
The data warehouse infrastructure that the business will adapt to solve the problem include the Extract, Transform and Load (ETL) technology, Business Intelligence (BI) Tools and ULAP Services (Krishnan, 2013).Extract, Transform and Load (ETL) Technologies are the most complex in data warehouse development. They are useful in getting data from its source by use of batch processing to modify it according to its requirements and loading it into the warehouse ( Nebot & Berlanga, 2012 ). This process helps in turning raw data into data that helps in decision making of the business.
Describe the ETL process to retrieve the data and load it
The Extract, Transform and Load (ETL) technology process involved is the extraction of data, cleaning, transforming and loading the data into the data warehouse database (Lee et al., 2012). The extract step consists in retrieving data from the source system for further desired processing. The cleaning level ensures that data in the data warehouse is quality based on some set standards and under minimum costs. This process makes it an essential step in the ETL process. When it comes to data transforming, a set of rules are applied to transform data from the source to where it is supposed to be. At the loading step, loading is performed carefully, and the data is entered into the data warehouse’s database. This process is the destination of the retrieved and processed data.
Describe how the data would be integrated.
Data in the data warehouse will be integrated at different levels. Data integration involves techniques and practices of combining data from various sources to one unified source. The data warehouse will integrate data in different ways. In this regard, there will be manual integration of data which will involve data synchronization with the data warehouse (Jrake et al., 2013). This system means the sources of data would be within the data warehouse. Another way is through the use of the Extract, Transform and Load (ETL) to ensure that extraction, transforming and loading of information to the database is integrated within the data warehouse and lastly is database replication.
Identify the metadata that would be useful to solve the problem.
The company will adopt different metadata approaches to make the finding and retrieving of data accessible. Metadata is used to represent or describe other data; thus it is a summary of the main article or data ( Tan, Blake, Saleh, & Dustdar, 2013 ). For this case, the metadata that shall be used include the following: the name of the data, the name of the author, the date created, the time modified, file type, and file size. These will be the necessary metadata that will be useful in improving the finding and retrieval of the business’ data.
Describe the techniques to perform the analyses.
Business Intelligence (BI) Tool is a process which involves the analysis of data and derivation of insights that are adopted by the business to make decisions that affect the company. Researchers and business analysts can use Business Intelligence tools to make propositions based on the available data and be able to provide answers to those propositions using the existing data(Lemke et al., 2014). However, this process, as it is the case with ETL, is also complicated and time-consuming as it involves processing raw materials. Business Intelligence enables a business to process raw data to optimized data and data that is organized. This aspect will make the finding and retrieving of data from the data warehouse a straightforward exercise.
With the Online Analytical Processing (OLAP) Servers, an organization can make complicated queries on huge data sets. It also allows swift, stable, and interactive access to the available data. A good data warehouse is one which has an Online Analytical processing server, or if not, one which works together with an Online Analytical Processing server to help facilitate its online interaction (Krishnan, 2013). These interactions are queries and reports made by the clients which will enable the company to make a well-informed decision about the business based on the customers’ feedbacks and recommendations. The company can also use OLAP to prepare sales reports, budget reports, finance reports, do strategic planning and any decision relating to sales and marketing.
Conclusion
In conclusion, it is evident that for a business to be successful in this modern era, especially one that has a large volume of data, needs to implement the big data solution strategies. Managing these data will prove pivotal to the success of the business. Information needs to be integrated and unified into one source to make its retrieval easy. When doing this, it’s good that one considers the volume of the data, the velocity of the data and the variety of data before setting the big data solutions in the data warehouse.
References
Aji, A., Wang, F., Vo, H., Lee, R., Liu, Q., Zhang, X., & Saltz, J. (2013). Hadoop gis: a high performance spatial data warehousing system over mapreduce. Proceedings of the VLDB Endowment , 6 (11), 1009-1020.
Jarke, M., Lenzerini, M., Vassiliou, Y., & Vassiliadis, P. (2013). Fundamentals of data warehouses . Springer Science & Business Media.
Krishnan, K. (2013). Data warehousing in the age of big data . Amsterdam : Morgan Kaufmann
Krochmal, M., Cisek, K., & Husi, H. (2018). Database Creation and Utility. Integration of Omics Approaches and Systems Biology for Clinical Applications , 286-300.
Lee, K. H., Lee, Y. J., Choi, H., Chung, Y. D., & Moon, B. (2012). Parallel data processing with MapReduce: a survey. AcM sIGMoD Record , 40 (4), 11-20.
Lemke, C., Mindnich, T., Weyerhaeuser, C., Faerber, F., & Sattler, K. U. (2014). Accelerated query operators for high-speed, in-memory online analytical processing queries and operations . U.S. Patent No. 8,892,586 . Washington, DC: U.S. Patent and Trademark Office.
Nebot, V., & Berlanga, R. (2012). Building data warehouses with semantic web data. Decision Support Systems , 52 (4), 853-868.
Pääkkönen, P., & Pakkala, D. (2015). Reference architecture and classification of technologies, products and services for big data systems. Big data research , 2 (4), 166-186.
Tan, W., Blake, M. B., Saleh, I., & Dustdar, S. (2013). Social-network-sourced big data analytics. IEEE Internet Computing , (5), 62-69.
Vera-Baquero, A., Colomo-Palacios, R., & Molloy, O. (2013). Business process analytics using a big data approach. IT Professional , 15 (6), 29-35.