Computing Issues: März 2015

Hello together,

About

today, it is about an issue, which concerns every computer consultant, programmer or computer scientist.
This topic is discussed about 56 million times on google (search string "list vs vector"). We need a clear decision to program qualitative and efficient programs. It is late in time to clarify the misunderstanding. And, you get the information for free. Independent on your location. This information is important to everyone, independent if you are a chief or a student.

The question is: What is the preferred container for your data?

We enlighten two containers: the list and the dynamic array (also called vector).

List

A list is a set of nodes, which consists of the data and one pointer to the next node. Nodes can be everywhere in the memory.
To have a reference to the list, we have a pointer to start of the list.
With few effort, we can also have a pointer to end of the list. This end-pointer is state of the art and typically the case (e.g. in STL lists), so we assume this additional condition.
More about lists you can read in "Algorithm and Data Structures"[1] or in the internet[2]

Vector

A vector is a set of data. This set is continuously combined in the memory.
To have a reference to the vector we have a so called offset.
More about dynamic arrays (vectors) can be read in "Algorithm and Data Structures"[1] or in the internet[3].

The court

Most people prefer vectors over list. This was my experience in my research.
Additionally, People using vectors blame people using lists to use vectors.
We assume therefore, that vector is the claimant and list is the defender.
So, we start with the arguments of using a vector.

Arguments of people using vectors:
- random access
- less memory
- faster allocation of memory
- faster indexing technical and theoretical
- compactness / discriminative

Claim of vectors

Random Access
There is the argument, that vectors have random access to their underlying data. This means, with a simple calculation of offset+x*s you get the memory position of the x'th data with size of data s. So, you can access the memory randomly in time O(1).
They claim: A list have to iterate from start x'th times to the next node.
This take O(n) time (see [1] for explanation of O-Notation).

Less Memory
They argue, that a list have to allocate additional memory for their next pointers. Lists allocate n times the next pointer, with a pointer currently 4 bytes in memory.
So, a dynamic array would only allocate the necessary space, while list would allocate the necessary space plus the next pointers. This is more space than necessary.

Faster Allocation of Memory
They claim the list to allocate the memory every time of a next element, which is every time a request to the system for free memory space. They argue, the request would take time, because some memory is already allocated and some memory is not allowed to use. Therefore, the system have to use complicated algorithms to request the memory space. Vectors, on the other hand, allocate only once their whole data space.

Faster Indexing Technical and Theoretical
First of all the technical is faster for vectors. They only have to calculate the next address of an arbitrary element in the container. While lists are using the next pointer to get the address. Therefore, they have to fetch the memory for the next pointer, while an array have only one calculation.
The second is the theoretical disadvantage of a list. The list have to iterate from start to the x'th node to get the data. This takes O(n) time.

Compactness / Discriminative
Additionally, there is an argument that lists are not compact. The nodes are equally distributed over the memory. So, loading the data into a cache does not have the same effect like for dynamic arrays.
Further on, "compactnes matters" - is it called from a famous computer expert.
This argument ("matters") is a little bit complicated. As far as I understand, it means: if you have compact data, you are not jumping around every time.
I will add the argument "Discriminative". This means, an array is allocating a whole memory block, so the data is distinguished between blocks. If your input is now distinguished between blocks, it is more easy to discriminate for the computer than it would be if the data is equally distributed. Some algorithms, which try to predict, have that problem of discrimination. This is used for example in the area of pattern recognition.

Plea

The defence appeals to innocent.

Random Access
This is true. Arrays have a faster random access than lists.
When do you need it? The most time you process data sequential.
Of course, your sensor or user input have random values, but the input is processed sequential.
Only if you have the inputs, which are coordinates according to an element of data. This is for example the case, if you have user input from the screen. On the screen you have input by the mouse in the form of coordinates. According to the coordinates, there are underlying elements on the screen, which should be processed.
Lists deputy argue, this case is the only one. In all other cases it is better to use lists, because they are processed sequential.
Please use lists more often than dynamic arrays!

Less Memory
To be honest, arrays wast memory also. They allocate memory, which is not immediately used.
Therefore ,you have to understand the concept for dynamic arrays.
Vector argue: There are tricks to reduce this allocation problem by creating a list of vectors.
The plea calls objection - this is manipulative. In this case comparison between lists and vectors are not possible.
Lets go back to the description of dynamic arrays.
Dynamic arrays double the size of allocated data at growing. Growing happens when you insert a new element to a filled dynamic array. In this moment, the array is allocating twice the current size and copies the old data to the new memory space. The old space is deleted afterwards.
This will take time and space. The time you need is amortized O(1) (see Table of O-Notations in [2]) to append a new element at the back of the array (the so called push_back() function). The space you need is amortized 3/4 * n space. This means you waste 1/4 * n space. We assume that an array is filled step by step with push_back(). So the averaged space used by the array is 3/4 full and 1/4 empty.
The plea want to submit evidence:
This evidence describe the amount of memory wasted by the list.
A pointer has currently 4byte. This is allocated n times. Makes 4*n byte.
The plea argue: If you have a data size s, which is over 16 byte, the array would wast memory of 1/4 * n * s. This is minimal 4 * n byte of memory. This is at least waisting as much space as a list do, for a >= 16byte data element.
On a machine with 64bi, using two integers reaches the size of 16bytes. See therefore [4].
The plea appeals to innocent, because of really rare exceptions of data structs below 16 byte.
Vector is interrupting: For big data we use pointers to the data.
The plea counters: Then, the advantage of arrays are going lost, because you need additional memory fetching.
The plea appeals to innocent in this case, cause lack of evidence!
Please use lists for data-elements above 16 bytes!
Doubles have already 8 byte[5]!

Faster Allocation of Memory
The plea is defending to innocent, because memory allocation is issued by array and list equally.
The defence submits some evidence: We are discussing about dynamic memory allocation. And about the currently common method of using blocks to handle the memory.
Dynamic memory allocation means, memory allocation while run-time.
Compared with static memory allocation, while compile-time, this is not yet allocated when you start the program.
The currently common method to use blocks is used to help avoiding fragmentation. As far as I understand, fragmentation is a problem, when you want to insert a big data-element and found no continuous space for it. Instead you have many small free spaces. So, you would have enough memory, but because of fragmentation there is no space for the big data.
The plea argue, if all data would have the same size, fragmentation would not happen. The problem therefore is not the size per se, but the difference in size.
So, arrays and lists are both equally guilty.
The plea appeals: In case of the accused.
The court sad: Fragmentation is not relevant here.

The plea gives them right and goes back to the topic.
The claimant intersects: The problem of allocation is not the fragmentation, but the overhead by the repeated allocation by the list.
The plea argues, that disadvantage is only different from array allocation if you use big arrays. In case of big arrays, the system will allocate a new block according to the size of the array, which is bigger than a threshold. In case of small arrays, the system have to look into the current block also. In this case of small arrays, you have to allocate often small arrays for big input.
So, you are also investigating the overhead.
Because of equal concern, the plea appeals for innocent.
Please use lists more often!

Faster Indexing - Technical and Theoretical
Technical is the indexing on arrays a calculation, whereas the indexing on lists is a fetching.
The plea agrees: This is a difference.
The computational steps on a list are:
1. reading memory address of next pointer from the register R2
2. fetching data from memory to register R1
3. and fetching next pointer from memory to register R2

The computational steps on a vector are:
1. increasing the index in the register
2. adding the offset from the register
3. fetching data from memory

Point 1. of array is typically an improvement for arrays.
The true steps of array would be:
1. reading index from register R1
2. calculating index + 1
3. reading offset from register R2
4. calculating index + offset
5. fetching data from memory to register R3

The plea appeals to think about the advantage on pipe-lining for fewer computational steps, as for lists.
Lists have no calculations.
The plea appeals to think about the register size.
List need only two register and therefore the processor can handle more lists synchronously.
Please, use lists for mixing different input!

Compactness / Discriminative
The plea begs the court to drop the argument of compactness, because it's misunderstanding.
My own argument Discriminative will be dropped, because the input is generally not discriminative. If the input would be discriminative, we could easily predict, just depending on which input we got. You would say, if the sensor A gives a signal it is an elephant and if the sensor B gives the signal it is a rabbit.

Conclusion

None of the arguments of vectors have had an evidence to prefer vectors over lists. Also, the performance is not arguable better.

Dynamic arrays are not better than lists.
Lists are more related to Touring-Machines.
If memory matters, use lists. If speed matters use vectors.
The problem is currently, that the additional data (next pointer) in the list
is used in the processing, while the additional data in the array (reserved fields) is not used in the processing.
This means, if speed really matters, the array would be better. Even though, fetching is slower than calculation, and lists do one more fetching.
On the other hand, the additional data of the list is only 4 byte for each element. And it is more recommended to use lists for big data (>16 bytes). See argument "less memory" above.
This means, if memory matters, use lists.

Beside the argument of memory, random access takes into account.
Use vectors only if you have random access mapping to data.
For example a user input by mouse on the screen. The mouse coordinate can be used to directly find the underlying data. This is also the case for mapping. In this case a map, which is a special array, is the best choice.
In all other cases, please prefer lists over vectors.
Your first question should be: Is my underlying data bigger than 16 bytes, then I use a list.
Now you can argue, nearly every primitive data type is below 16 byte (on 32bit machines). So we can split the data type bigger than 16 byte in several arrays with primitive data types.
That is true, you can do this if you can program in that way.
Normally, it is like baking a cake. You need different ingredients like milk, wheat, eggs and butter. Now you are combining the ingredients and processing them in the oven. In the end you should get a cake.
I don't know how to bake a cake without mixing the ingredients together.
Perhaps it is possible to mix the ingredients together in the last moment. This would be very hard to program.

Other Argue

"One block allocation - if you don't have enough memory to provide a single block (but you have sufficient scattered memory blocks) to allocate the space for the array then you'll need to defragment and other similar stuff to first create a free block of that size. So you may like to term it as improper utilization of memory :-)"
[http://geekexplains.blogspot.de/2008/05/array-vs-linked-list-advantages-and.html]

[http://programmers.stackexchange.com/questions/185222/what-is-the-point-of-using-lists-over-vectors-in-c]

References

[1] Th. Ottmann, P. Widmayer. Algorithmen und Datenstrukturen. xvii+716 Seiten, 4. Auflage, Spektrum Akademischer Verlag, Heidelberg 2002
[2] http://en.wikipedia.org/wiki/Linked_list 2015
[3] http://en.wikipedia.org/wiki/Dynamic_array 2015
[4] http://en.wikipedia.org/wiki/Integer_%28computer_science%29
[5] http://en.wikipedia.org/wiki/Floating_point

Computing Issues

Samstag, 28. März 2015

Why to use linked list more often than dynamic arrays