# Algorithms in General

An algorithm is either a process or a set of rules used to calculate information or solve problems in operations, this can be done manually however is commonly used by computers. This is because computers can operate at a rate that humans cannot compete with, meaning more often than not a computer is the best solution. This is more so the case even when simple algorithms are used with one another, for when this occurs a large amount of data is produced or used, making the task impossible for humans.

Your company may already produce large amounts of data, which could be unorganised or have duplications, this this you would require the data to be cleansed before integrating it with a modern system. This process requires a sorting algorithm. Sorting algorithms are used to order and manage lists of data, considering the time required to implement the algorithm and space required to store the data associated with it. All the algorithms we use, we know how to do manually, this is because a lot of programming languages have prebuilt algorithms, meaning implementation is easy but understanding is another aspect entirely. Lowe (2017), explains that because Java has prebuilt sorting algorithms, calculating them is often an intellectual exercise with the purpose of demonstrating your understanding of how each algorithm works.

Marszałek (2018) explains that merge sort is often used in databases, for organisation and searching of data. It is identified as the most appropriate sorting algorithm for serializing data in databases such as NoSQL as discussed by Woźniak et al (2013). Meaning where data is unorganised an approach is required that allows a user to retrieve information with a route. Merge sort is used to divide unsorted data into *n* sub-arrays, each containing one element. Adjacent, single element sub-arrays are then merged and sorted back together until all that remains is one merged array.

Insertion sort is another basic sorting algorithm. It generates a sorted array with one piece of data at a time, in principle it works through insertion of data at a particular point. Bender et al (2006) explains that the target location is found, and each new piece of data is inserted after this location ahead by one array position. It is not good for large data as it is less efficient than that of a quick sort or merge sort.

Quicksort works by first selecting an element, usually at the beginning or end of the array, and using this element as a pivot. The idea is to sort any elements that are smaller than the pivot to the left of the array, and any elements that are larger than the pivot will be right of the array, meaning the array will be partitioned. The algorithm then recurses and does this for each partitioned section of the array until it is sorted, and then concatenated into the complete sorted array as explained by Hoare (1962).

Bucket sort revolves around creating a ‘bucket’ for the data and depositing the values of these arrays into these buckets. They can then be sorted in their separate arrays and concatenated once sorted. Each bucket is given a ceiling and a floor, and this decides which bucket each value is placed in. The values of a bucket with index x, must all be smaller than the values in a bucket with index (x+1). Once the data has been separated into buckets, they can be sorted using insertion sort for example. All the buckets can then be joined to give a fully sorted array. Bucket sort would have been a strong choice for the mostly sorted array, or the randomly sorted array, as the best and average case complexity is extremely strong. However, it has a poor worst-case complexity, so might not have been a good choice for the first algorithm.

The selection sort considers the array in two halves, sorted and unsorted. A counter passes through the array and takes note of the value of the lowest element. If it encounters any element with a value less than the held value, it takes note of this instead. Once the array has been traversed, the held value is moved to the front of the array, and this is now part of the sorted half of the array. This process continues until the array is fully sorted. Selection sort is one of the best in terms of space complexity, as it simply stores a single value, and sends it back in the array. However, it must go through the whole array for each element in the array, which makes it perform poorly in terms of time complexity.

In randomly distributed sets of data, typically the worst-case scenario would be used in the analysis in terms of the complexity of the algorithm, as this will not underestimate how the algorithm will perform. This approach can be improved as it can improperly suggest that an algorithm would perform poorly as discussed by Spielman and Teng (2004). One response to this is to use the average case complexity, however, there are also issues with using this approach. It is rare that the average case complexity for an algorithm is known due to the fact this depends on the data that the algorithm is being applied to, and even then, Spielman and Teng (2004) point out that it can be unconvincing.

They propose another alternative, which they call ‘smooth analysis’. Essentially, smooth analysis “continuously interpolates” between the average and worst-case analysis of the algorithm. Spielman and Teng (2004) outline how smoothed complexity works based upon the ‘slight random perturbations’ of the data being inputted, and that if there is a low smoothed complexity, then it can be expected to perform well. This is a relatively new way of analysing the effectiveness of algorithms and would be overboard for data sets of the size in the task. For future work with large amounts of data and great constraints in terms of space and time of the algorithm, smoothed analysis could be considered to better determine how effective specific algorithms would be for a task.

BENDER, M.A., FARACH-COLTON, M. AND MOSTEIRO, M.A., 2006. Insertion sort is O (n log n). *Theory of Computing Systems*, *39*(3), pp.391-397.

HOARE, C., 1962. Quicksort. *The Computer Journal* [online]. 5 (1), pp. 10-16. [Accessed 5 April 2020].

LOWE, D., 2017. Java All-In-One for Dummies. 5th ed. John Wiley & Sons, Inc.

MARSZAŁEK, Z., 2018. Performance tests on merge sort and recursive merge sort for big data processing. *Technical Sciences/University of Warmia and Mazury in Olsztyn*.

SPIELMAN, D. and TENG, S., 2004. Smoothed analysis of algorithms. Journal of the ACM [online]. 51 (3), pp. 385-463. [Accessed 8 November 2020].

WOŹNIAK, M., MARSZAŁEK, Z., GABRYEL, M. AND NOWICKI, R.K., 2013. On quick sort algorithm performance for large data sets. *Looking into the Future of Creativity and Decision Support Systems, AMJ Skulimowski, Ed*, pp.7-9.