数据结构和算法@一些记录

数据结构和算法

写在前面

线性结构作为最常用的数据结构，其特点是数据元素之间存在一对一的线性关系，线性结构拥有两种不同的存储结构，即顺序存储结构和链式存储结构
非线性结构特点是一个结点元素可能有多个直接前驱和多个直接后继。常见的非线性结构有：二（多）维数组、树、图

数组

线性表
连续的内存空间声明数组时需要预先指定大小
存储相同类型的数据
随机访问

链表

当一个List拥有快速访问功能时，其遍历方法采用for循环最快速。而没有快速访问功能的List，遍历的时候采用Iterator迭代器最快速。当我们不明确获取到的是Arraylist，还是LinkedList的时候，我们可以通过RandomAccess来判断其是否支持快速随机访问，若支持则采用for循环遍历，否则采用迭代器遍历。 ArrayList 其实就是一个动态数组。优势就是将数组的操作封装起来，比如增删查，以及动态扩容，每次1.5倍：int newCapacity = oldCapacity + (oldCapacity >> 1);，同时还会防止溢出进行比较MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8。 ArrayList中的线程安全问题：size成员变量问题，设置值得时候的原子性操作问题elementData[size++] = e;元素减少，缩容：elementData[--size] = null; // clear to let GC do its work,trimToSize()
ArrayList.subList() 方法返回的List是ArrayList中某段数据的一个视图. 因此, 在操作此方法返回的List时, 同样会改变ArrayList的数据，同样在CopyOnWriteArrayList也存在这个问题，获取到一个远列表的试图，因此取subList时需要考虑线程安全问题，更好的方式是使用不可变列表（例如Guava的ImmutableList），避免对原始列表的更新导致子列表抛出异常
Fail-Fast机制只能用作bug判断，并不是一个强保证。也好理解，其实多线程环境下也不是一定会触发抛ConcurrentModificationException异常的条件
Fail-Safe任何对集合结构的修改都会在一个复制的集合上进行修改，但是会有一下问题需要复制集合，产生大量的无效对象，开销大；无法保证读取的数据是目前原始数据结构中的数据
Vector和ArrayList基本一样，线程安全；Vector可以设置capacityIncreament(容量增长量)，而ArrayList不可以
CopyOnWriteArrayList的几个特点：thread-safe：对于引起list变化（内容或者结构）都是使用ReentrantLock加锁，然后通过底层的数组copy完成的。虽然可以保证安全，但是觉得效率低呀，需要额外的复制数据的操作，只适合读操作远多于写操作的场景。

栈

先进后出，后进先出
栈又分顺序栈和链式栈。基于数组实现的栈叫做顺序栈，基于链表实现的则叫链式栈
堆是堆（heap），栈是栈（stack），堆栈是栈
函数调用的局部状态之所以用栈来记录是因为这些数据的存活时间满足“后入先出”（LIFO）顺序，而栈的基本操作正好就是支持这种顺序的访问
栈这个数据结构，真正模拟的，其实是嵌套关系，一切具备嵌套性质的东西，用堆栈做是最自然的
Java中的Stack类，由于是继承于Vector，所以它是线程安全的，其本质上是由数组这个基本数据结构实现的，知道这点之后可以更好的进行理解。

队列

树

每个节点都只有有限个子节点或无子节点；没有父节点的节点称为根节点；每一个非根节点有且只有一个父节点；除了根节点外，每个子节点可以分为多个不相交的子树；树里面没有环路
相关名词：层，根节点，子节点，父节点，兄弟节点，叶子节点，度数
树节点的度数即为该节点孩子的个数，一棵树的度指其中节点的度最大值

二叉树Binary Tree，每个结点至多拥有两棵子树（即二叉树中不存在度大于2的结点），若二叉树的层次从0开始，则在二叉树的第i层至多有2^i个结点（i>=0）

From here https://dev.to/marshallhelenm/binary-search-trees-part-i-4p05

二叉查找树 Binary Search Tree
- The left subtree of a node contains only nodes with keys lesser than the node’s key
- The right subtree of a node contains only nodes with keys greater than the node’s key
- The left and right subtree each must also be a binary search tree
- There must be no duplicate nodes

完美二叉树Perfect Binary Tree(PBT)，国内一般称之为满二叉树，所有的叶子节点具有相同的深度。完美(Perfect)二叉树一定是完全(Complete)二叉树，但完全(Complete)二叉树不一定是完美(Perfect)二叉树；完美(Perfect)二叉树一定是完满(Full)二叉树，但完满(Full)二叉树不一定是完美(Perfect)二叉树
完全二叉树Complete Binary Tree （CBT)，从根结点到倒数第二层满足完美二叉树，最后一层可以不完全填充，其叶子结点都靠左对齐。
完满二叉树Full Binary Tree (FBT)，所有非叶子结点的度都是2。（只要你有孩子，你就必然是有两个孩子）
平衡二叉树In computer science, an AVL tree (named after inventors Adelson-Velsky and Landis) is a self-balancing binary search tree，当且仅当任何节点的两棵子树的高度差不大于1的二叉树；
霍夫曼树 Huffman Tree
- 权越大的叶子离根越近
- 在权为w1,w2,…,wn的n个叶子结点的所有二叉树中，所有叶子结点的带权路径长度之和最小的二叉树称为赫夫曼树或最优二叉树
B-Tree，To understand the use of B-Trees, we must think of the huge amount of data that cannot fit in main memory. When the number of keys is high, the data is read from disk in the form of blocks. Disk access time is very high compared to the main memory access time. The main idea of using B-Trees is to reduce the number of disk accesses. Most of the tree operations (search, insert, delete, max, min, ..etc ) require O(h) disk accesses where h is the height of the tree. B-tree is a fat tree. The height of B-Trees is kept low by putting maximum possible keys in a B-Tree node. Generally, the B-Tree node size is kept equal to the disk block size. Since the height of the B-tree is low so total disk accesses for most of the operations are reduced significantly compared to balanced Binary Search Trees like AVL Tree, Red-Black Tree, ..etc
- B-Tree is known as a self-balancing tree as its nodes are sorted in the inorder traversal
- All the leaf nodes of the B-tree must be at the same level
- Above the leaf nodes of the B-tree, there should be no empty sub-trees
- B- tree’s height should lie as low as possible
- ……………………………………….我是分隔线…………………………………………….
- A B-Tree is defined by the term minimum degree ’t’. The value of t depends upon disk block size
- Every node except root must contain at least t-1 keys. The root may contain minimum 1 key
All nodes (including root) may contain at most 2t – 1 keys？？？？？
Number of children of a node is equal to the number of keys in it plus 1
All keys of a node are sorted in increasing order. The child between two keys k1 and k2 contains all keys in the range from k1 and k2
B-Tree grows and shrinks from the root which is unlike Binary Search Tree. Binary Search Trees grow downward and also shrink from downward.？？？？？
Like other balanced Binary Search Trees, time complexity to search, insert and delete is O(log n).

B+Tree storing data pointers only at the leaf nodes of the tree

B-Tree结构图中可以看到每个节点中不仅包含数据的key值，还有data值。而每一个页的存储空间是有限的，如果data数据较大时将会导致每个节点（即一个页）能存储的key的数量很小，当存储的数据量很大时同样会导致B-Tree的深度较大，增大查询时的磁盘I/O次数，进而影响查询效率。在B+Tree中，所有数据记录节点都是按照键值大小顺序存放在同一层的叶子节点上，而非叶子节点上只存储key值信息，这样可以大大加大每个节点存储的key值数量，降低B+Tree的高度

红黑树 self-balancing Binary Search Tree (BST)
- Every node has a color either red or black
- Root of tree is always black
- There are no two adjacent red nodes (A red node cannot have a red parent or red child)
- Every path from a node (including root) to any of its descendant NULL node has the same number of black nodes
- 红黑树是保持“黑平衡”的二叉树，严格意义来说，红黑树不是一棵平衡二叉树
Comparison with AVL Tree
- The AVL trees are more balanced compared to Red-Black Trees, but they may cause more rotations during insertion and deletion. So if your application involves many frequent insertions and deletions, then Red Black trees should be preferred. And if the insertions and deletions are less frequent and search is a more frequent operation, then AVL tree should be preferred over Red-Black Tree
线段树（segment tree）
R树
字典树
前缀树
后缀树

图

好玩的

你要不会唠嗑能不能别硬唠。--[缝纫机乐队]

参考

极客时间-数据结构与算法
关于不同树之间的区别

本文由发表于 2020-08-24 21:14:47

> 可在下面留言（需要有 GitHub 账号）

写在前面

数组

链表

栈

队列

树

图

好玩的

相关介绍

参考