A dataset is a collection of numerical figures describing a subject of interest or a process under study. Datasets can be represented in a tabular form, in curly brackets, or randomly in groups of numbers. Datasets are always labeled according to the subject matter they represent. Examples of datasets that I might look into in my career include sales of a company in a financial year or the inventory data of a company over a given time.
The three primary ways of analyzing datasets are through the three measures of central tendency; mean, mode, and median. The mean of the data refers to the average value of the data and is evaluated by summing up the values then dividing by the number of items in the dataset. Mode refers to the data value that appears the most times in the dataset. Median, on the other hand, is the value lying in the middle of the dataset when the date is arranged in ascending order. The datasets should, therefore, be rewritten in a numerical from small to larger before finding the median. Of the three measures of central tendency, the mean is the most efficient measure and therefore represents the center (Sahai & Ray, 1980). Mean utilizes all the sample information given in the dataset compared to mode and median.
Delegate your assignment to our experts and they will do the rest.
In instances where the dataset comprises highly skewed data like in the case of sales, the mean tends to be affected by extreme values. In these situations, the median will be the most appropriate measure of center. Half of the values in the dataset are greater than the average, and half are less. Medians tend to be affected less by outliers (Von, 2005). The mean therefore is the best-suited measure of central tendency in the situation.
References
Sahai, A., & Ray, S. K. (1980). An efficient estimator using auxiliary information. Metrika , 27 (1), 271-275.
Von Hippel, P. T. (2005). Mean, median, and skew: Correcting a textbook rule. Journal of Statistics Education , 13 (2).