在动态面向对象语言中,多态不必通过继承实现,详情可查询关于duck typing的讨论。
Java从10开始支持局部变量类型推断。
https://liuweiqiang.me/2022/03/31/java-io.html
ArrayList底层用Object[]实现,内部使用了两个空数组:
源码中注释的说法是
We distinguish this from EMPTY_ELEMENTDATA to know how much to inflate when first element is added.
翻开8的源码,只有无参构造器会赋值DEFAULTCAPACITY_EMPTY_ELEMENTDATA:
public ArrayList() {
this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
}
在调用add方法时,只有DEFAULTCAPACITY_EMPTY_ELEMENTDATA是按DEFAULT_CAPACITY(10)扩容的:
public boolean add(E e) {
ensureCapacityInternal(size + 1); // Increments modCount!!
elementData[size++] = e;
return true;
}
private void ensureCapacityInternal(int minCapacity) {
ensureExplicitCapacity(calculateCapacity(elementData, minCapacity));
}
private static int calculateCapacity(Object[] elementData, int minCapacity) {
if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
return Math.max(DEFAULT_CAPACITY, minCapacity);
}
return minCapacity;
}
private void ensureExplicitCapacity(int minCapacity) {
modCount++;
// overflow-conscious code
if (minCapacity - elementData.length > 0)
grow(minCapacity);
}
核心的扩容方法grow,是按1.5倍扩容的,并保证扩容满足minCapacity:
private void grow(int minCapacity) {
// overflow-conscious code
int oldCapacity = elementData.length;
int newCapacity = oldCapacity + (oldCapacity >> 1);
if (newCapacity - minCapacity < 0)
newCapacity = minCapacity;
if (newCapacity - MAX_ARRAY_SIZE > 0)
newCapacity = hugeCapacity(minCapacity);
// minCapacity is usually close to size, so this is a win:
elementData = Arrays.copyOf(elementData, newCapacity);
}
HashMap底层(第一层)是Node<K,V>[]
An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.
HashMap核心的方法:
public V put(K key, V value) {
// 内部的扰动函数使得(2幂)碰撞的可能性更低,见 https://www.zhihu.com/question/20733617/answer/111577937
return putVal(hash(key), key, value, false, true);
}
final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
boolean evict) {
Node<K,V>[] tab; Node<K,V> p; int n, i;
if ((tab = table) == null || (n = tab.length) == 0)
n = (tab = resize()).length;
if ((p = tab[i = (n - 1) & hash]) == null)
tab[i] = newNode(hash, key, value, null);
else {
Node<K,V> e; K k;
if (p.hash == hash &&
((k = p.key) == key || (key != null && key.equals(k))))
e = p;
else if (p instanceof TreeNode)
e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
else {
for (int binCount = 0; ; ++binCount) {
if ((e = p.next) == null) {
p.next = newNode(hash, key, value, null);
if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st
treeifyBin(tab, hash);
break;
}
if (e.hash == hash &&
((k = e.key) == key || (key != null && key.equals(k))))
break;
p = e;
}
}
if (e != null) { // existing mapping for key
V oldValue = e.value;
if (!onlyIfAbsent || oldValue == null)
e.value = value;
afterNodeAccess(e);
return oldValue;
}
}
++modCount;
if (++size > threshold)
resize();
afterNodeInsertion(evict);
return null;
}
i = (n - 1) & hash,即把key值hash取模后放到桶中。 如果桶
如果在桶链表插入过程中,binCount >= TREEIFY_THRESHOLD - 1,且capacity大于等于MIN_TREEIFY_CAPACITY(64)会进行树化(java.util.HashMap.TreeNode#treeify)。 否则capacity小于64只进行扩容,且扩容后不再进行树化操作,下一次冲突还是扩容,直到容量到达64。
红黑树put:
final TreeNode<K,V> putTreeVal(HashMap<K,V> map, Node<K,V>[] tab,
int h, K k, V v) {
Class<?> kc = null;
boolean searched = false;
TreeNode<K,V> root = (parent != null) ? root() : this;
for (TreeNode<K,V> p = root;;) {
int dir, ph; K pk;
if ((ph = p.hash) > h)
dir = -1;
else if (ph < h)
dir = 1;
else if ((pk = p.key) == k || (k != null && k.equals(pk)))
return p;
else if ((kc == null &&
(kc = comparableClassFor(k)) == null) ||
(dir = compareComparables(kc, k, pk)) == 0) {
if (!searched) {
TreeNode<K,V> q, ch;
searched = true;
if (((ch = p.left) != null &&
(q = ch.find(h, k, kc)) != null) ||
((ch = p.right) != null &&
(q = ch.find(h, k, kc)) != null))
return q;
}
dir = tieBreakOrder(k, pk);
}
TreeNode<K,V> xp = p;
if ((p = (dir <= 0) ? p.left : p.right) == null) {
Node<K,V> xpn = xp.next;
TreeNode<K,V> x = map.newTreeNode(h, k, v, xpn);
if (dir <= 0)
xp.left = x;
else
xp.right = x;
xp.next = x;
x.parent = x.prev = xp;
if (xpn != null)
((TreeNode<K,V>)xpn).prev = x;
moveRootToFront(tab, balanceInsertion(root, x));
return null;
}
}
}
红黑树首先按hash排序,和前面一样,如果key相等就将旧值返回。(dir)小于时放左边,大于时放右边。 在hash相等的情况下,会通过判断是否实现了comparable比较,如果再相等再比较类名然后java.lang.System#identityHashCode。
resize初始化及扩容:
final Node<K,V>[] resize() {
Node<K,V>[] oldTab = table;
int oldCap = (oldTab == null) ? 0 : oldTab.length;
int oldThr = threshold;
int newCap, newThr = 0;
if (oldCap > 0) {
if (oldCap >= MAXIMUM_CAPACITY) {
threshold = Integer.MAX_VALUE;
return oldTab;
}
else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY &&
oldCap >= DEFAULT_INITIAL_CAPACITY)
newThr = oldThr << 1; // double threshold
}
else if (oldThr > 0) // initial capacity was placed in threshold
newCap = oldThr;
else { // zero initial threshold signifies using defaults
newCap = DEFAULT_INITIAL_CAPACITY;
newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY);
}
if (newThr == 0) {
float ft = (float)newCap * loadFactor;
newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ?
(int)ft : Integer.MAX_VALUE);
}
threshold = newThr;
@SuppressWarnings({"rawtypes","unchecked"})
Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
table = newTab;
if (oldTab != null) {
for (int j = 0; j < oldCap; ++j) {
Node<K,V> e;
if ((e = oldTab[j]) != null) {
oldTab[j] = null;
if (e.next == null)
newTab[e.hash & (newCap - 1)] = e;
else if (e instanceof TreeNode)
((TreeNode<K,V>)e).split(this, newTab, j, oldCap);
else { // preserve order
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
else {
if (hiTail == null)
hiHead = e;
else
hiTail.next = e;
hiTail = e;
}
} while ((e = next) != null);
if (loTail != null) {
loTail.next = null;
newTab[j] = loHead;
}
if (hiTail != null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
}
}
}
return newTab;
}
除去初始化的逻辑,基本就是扩大两倍(newThr = oldThr << 1),然后根据是树节点还是链表节点进行迁移。 容量设计为2次幂是基于算法效率上的考虑,如使用e.hash & (newCap - 1)计算下标。
HashSet实现上基本是对HashMap的包装,value为固定的 private static final Object PRESENT = new Object() 。 面试基本上是问以上两种集合,其它在笔试(算法)中比较常见。
LinkedList使用双向链表实现:
private static class Node<E> {
E item;
Node<E> next;
Node<E> prev;
Node(Node<E> prev, E element, Node<E> next) {
this.item = element;
this.next = next;
this.prev = prev;
}
}
同时还实现了Deque(音同deck,双向队列)接口,其中Deque还声明了栈的相关操作。
用数组实现的双向队列,插入后自动扩容。
优先队列,通过堆实现,底层也是数组Object[]。
Priority queue represented as a balanced binary heap: the two children of queue[n] are queue[2n+1] and queue[2(n+1)].
在末尾添加元素,并与父节点比较进行调整满足堆的约束。 在头部删除元素时,将末尾元素移到头部,并比较两子节点进行调整满足堆的约束。
TreeMap底层是红黑树,与HashMap中的红黑树类似,在增删元素后需要进行调整(改变节点颜色或旋转)以满足红黑树的约束。
红黑树的定义限制了树的高度不会大于2lg(n+1),相对于AVL树来说,它的旋转次数少。
final V putVal(K key, V value, boolean onlyIfAbsent) {
if (key == null || value == null) throw new NullPointerException();
int hash = spread(key.hashCode());
int binCount = 0;
for (Node<K,V>[] tab = table;;) {
Node<K,V> f; int n, i, fh;
if (tab == null || (n = tab.length) == 0)
tab = initTable();
else if ((f = tabAt(tab, i = (n - 1) & hash)) == null) {
if (casTabAt(tab, i, null,
new Node<K,V>(hash, key, value, null)))
break; // no lock when adding to empty bin
}
else if ((fh = f.hash) == MOVED)
tab = helpTransfer(tab, f);
else {
V oldVal = null;
synchronized (f) {
if (tabAt(tab, i) == f) {
if (fh >= 0) {
binCount = 1;
for (Node<K,V> e = f;; ++binCount) {
K ek;
if (e.hash == hash &&
((ek = e.key) == key ||
(ek != null && key.equals(ek)))) {
oldVal = e.val;
if (!onlyIfAbsent)
e.val = value;
break;
}
Node<K,V> pred = e;
if ((e = e.next) == null) {
pred.next = new Node<K,V>(hash, key,
value, null);
break;
}
}
}
else if (f instanceof TreeBin) {
Node<K,V> p;
binCount = 2;
if ((p = ((TreeBin<K,V>)f).putTreeVal(hash, key,
value)) != null) {
oldVal = p.val;
if (!onlyIfAbsent)
p.val = value;
}
}
}
}
if (binCount != 0) {
if (binCount >= TREEIFY_THRESHOLD)
treeifyBin(tab, i);
if (oldVal != null)
return oldVal;
break;
}
}
}
addCount(1L, binCount);
return null;
}
如果桶节点为null,则CAS插入,否则加锁。
初始化initTable(以及其它扩容操作)通过成员变量sizeCtl的CAS操作来进行同步操作。
private final Node<K,V>[] initTable() {
Node<K,V>[] tab; int sc;
while ((tab = table) == null || tab.length == 0) {
if ((sc = sizeCtl) < 0)
Thread.yield(); // lost initialization race; just spin
else if (U.compareAndSwapInt(this, SIZECTL, sc, -1)) {
try {
if ((tab = table) == null || tab.length == 0) {
int n = (sc > 0) ? sc : DEFAULT_CAPACITY;
@SuppressWarnings("unchecked")
Node<K,V>[] nt = (Node<K,V>[])new Node<?,?>[n];
table = tab = nt;
sc = n - (n >>> 2);
}
} finally {
sizeCtl = sc;
}
break;
}
}
return tab;
}
final Node<K,V>[] helpTransfer(Node<K,V>[] tab, Node<K,V> f) {
Node<K,V>[] nextTab; int sc;
if (tab != null && (f instanceof ForwardingNode) &&
(nextTab = ((ForwardingNode<K,V>)f).nextTable) != null) {
int rs = resizeStamp(tab.length);
while (nextTab == nextTable && table == tab &&
(sc = sizeCtl) < 0) {
if ((sc >>> RESIZE_STAMP_SHIFT) != rs || sc == rs + 1 ||
sc == rs + MAX_RESIZERS || transferIndex <= 0)
break;
if (U.compareAndSwapInt(this, SIZECTL, sc, sc + 1)) {
transfer(tab, nextTab);
break;
}
}
return nextTab;
}
return table;
}
主要用在框架,如Spring、Hibernate、MyBatis、Dubbo、Mockito、Jackson,使它们在运行时能动态地操作类。
运行
public static void main(String[] args) {
while (true) {
System.out.println("x");
}
}
用JDK自带的工具jstack可以查看到main线程
"main" #1 prio=5 os_prio=31 tid=0x000000015280b000 nid=0x1603 runnable [0x000000016d696000]
java.lang.Thread.State: RUNNABLE
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
- locked <0x00000006c0007de8> (a java.io.BufferedOutputStream)
at java.io.PrintStream.write(PrintStream.java:482)
- locked <0x00000006c00070b8> (a java.io.PrintStream)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104)
- locked <0x00000006c0007058> (a java.io.OutputStreamWriter)
at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:185)
at java.io.PrintStream.write(PrintStream.java:527)
- eliminated <0x00000006c00070b8> (a java.io.PrintStream)
at java.io.PrintStream.print(PrintStream.java:669)
at java.io.PrintStream.println(PrintStream.java:806)
- locked <0x00000006c00070b8> (a java.io.PrintStream)
at me.liuweiqiang.hibernate.EventListener.main(EventListener.java:61)
查看java.lang.Thread.State源码,有以下枚举:
Thread state for a thread which has not yet started.
正常运行的线程。
只要线程未获取到锁,就是BLOCKED状态,不管是执行同步方法/块是未获取到锁还是wait释放锁。
public static void main(String[] args) throws Exception {
Thread thread = new Thread(() -> {
synchronized (EventListener.class) {
System.out.println("x");
try {
EventListener.class.wait();
// 良好的习惯是wake up后在循环中检查执行条件
} catch (InterruptedException e) { // 获取锁后才会catch
e.printStackTrace();
}
}
});
thread.start();
TimeUnit.SECONDS.sleep(1);
synchronized (EventListener.class) {
EventListener.class.notifyAll(); // 需要特别注意notify信号可能会错过,因为一个可能的调度是notify时其它线程还尚未wait(尚未synchronized)
System.in.read();
}
}
Object.wait后获取到锁未被notify、Thread.join、LockSupport.park时进入的状态。 对于ReentrantLock.lock、ReentrantLock.newCondition.await,由于底层调用的是LockSupport.park,所以线程状态不是BLOCKED而是WAITING或TIMED_WAITING。
Tips:notify是"不可靠的",被notify的对象可能会错过notify信号。
同上,不过带时间参数,Thread.sleep时也是。
和分布式系统一样,在并发处理器下,不同的模型会有不同的一致性级别。 简单地说,JMM是一种模型,抽象出了Java程序员与JVM实现之间的契约。
The Java Memory Model describes what behaviors are legal in multithreaded code, and how threads may interact through memory. It describes the relationship between variables in a program and the low-level details of storing and retrieving them to and from memory or registers in a real computer system. It does this in a way that can be implemented correctly using a wide variety of hardware and a wide variety of compiler optimizations. Java includes several language constructs, including volatile, final, and synchronized, which are intended to help the programmer describe a program's concurrency requirements to the compiler. The Java Memory Model defines the behavior of volatile and synchronized, and, more importantly, ensures that a correctly synchronized Java program runs correctly on all processor architectures.
具体可见http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html
中文版可以参考程晓明的《深入理解Java内存模型》。
从Java应用程序员的角度看的话,最重要之一的是Happens-before Order规则(可见性): If one action happens-before another, then the first is visible to and ordered before the second.
It should be noted that the presence of a happens-before relationship between two actions does not necessarily imply that they have to take place in that order in an implementation. If the reordering produces results consistent with a legal execution, it is not illegal.
除了Happens-before Order,Java语言规范还定义了Synchronization Order(可见性及原子性)。 Synchronization Order包含于Happens-before Order:
If an action x synchronizes-with a following action y, then we also have hb(x, y).
public class Singleton {
private volatile static Singleton instance;
private Singleton() {
}
public static Singleton getInstance() {
if (instance == null) {
synchronized (Singleton.class) {
if (instance == null) {
instance = new Singleton();
}
}
}
return instance;
}
}
单例的饿汉式写法因为一个bug变得非常经典。 第一个对 instance == null 的判断是为了避免对Singleton.class锁的无效竞争。 但是因为第一个判断不是同步的,所以多个线程可能"同时"得到true的结果。 而Singleton.class锁保护了第二个 instance == null 的判断,同时volatile修饰instance变量是必要的: 这里的synchronized只保护引用,不保护对象的构造,volatile避免未初始化的Singleton被引用。
使用单例单例的一个好处是可以节约资源。
当使用互斥锁来实现同步时,通常会遇到死锁的问题。死锁的四个条件:
死锁防止,破坏上面四个条件之一:
死锁避免,运行时避免死锁,如银行家算法
死锁检测和解除
指HotSpot对synchronized的优化。 由于这些优化是与JVM实现相关的,大部分相关的原理需要查阅JVM的源代码(C++)。
包括:
Enable naive spinning on Java monitor before entering operating system thread synchronization code.
即Compare-And-Swap。 几乎所有线程安全机制的底层实现都利用CAS来实现。 JVM的CAS底层是通过处理器CAS指令实现的,对于数据库CAS的实现,可以通过update set value='n' from table where version=2。
如果compare的值不是一个单调(递增)的值,那么就会存在value值A->B->A变化后被CAS修改的情况,如果这是一个问题,那么就需采用单调的版本号。
AbstractQueuedSynchronizer。是众多同步器实现的抽象基类。
ReentrantLock(fair)中同步器FairSync的lock()会落到acquire(1):
private volatile int state;
private transient volatile Node head; // Node结点是等待获取资源的线程的封装
private transient volatile Node tail; // 同时通过Node成员变量next/prev维护了一个队列
public final void acquire(int arg) {
if (!tryAcquire(arg) && // 获取锁成功后直接返回
acquireQueued(addWaiter(Node.EXCLUSIVE), arg)) // 否则加入队列
selfInterrupt();
}
protected final boolean tryAcquire(int acquires) {
final Thread current = Thread.currentThread();
int c = getState();
if (c == 0) {
if (!hasQueuedPredecessors() && // 即使这里可能会因为竞争问题倒是if判断过期了,还是会在acquireQueued()入队处理
compareAndSetState(0, acquires)) { // 获取到锁,所以setExclusiveOwnerThread是安全的
setExclusiveOwnerThread(current); // 这里还体现了公平锁的策略:只有!hasQueuedPredecessors()才去竞争锁
return true;
}
}
else if (current == getExclusiveOwnerThread()) { // 可重入
int nextc = c + acquires;
if (nextc < 0)
throw new Error("Maximum lock count exceeded");
setState(nextc);
return true;
}
return false;
}
private Node addWaiter(Node mode) {
Node node = new Node(Thread.currentThread(), mode);
// Try the fast path of enq; backup to full enq on failure
Node pred = tail;
if (pred != null) {
node.prev = pred;
if (compareAndSetTail(pred, node)) { // 保护setTail及 pred.next = node
pred.next = node;
return node;
}
}
enq(node);
return node;
}
private Node enq(final Node node) { // 自旋入队
for (;;) {
Node t = tail;
if (t == null) { // Must initialize
if (compareAndSetHead(new Node())) // dummy哑节点
tail = head;
} else {
node.prev = t;
if (compareAndSetTail(t, node)) { // 保护setTail及 pred.next = node
t.next = node;
return t;
}
}
}
}
final boolean acquireQueued(final Node node, int arg) {
boolean failed = true;
try {
boolean interrupted = false;
for (;;) {
final Node p = node.predecessor();
if (p == head && tryAcquire(arg)) { // 若p == head的判断过期,自旋重新判断
setHead(node);
p.next = null; // help GC
failed = false;
return interrupted;
}
if (shouldParkAfterFailedAcquire(p, node) &&
parkAndCheckInterrupt()) // park失败会重新进入循环
interrupted = true;
}
} finally {
if (failed)
cancelAcquire(node);
}
}
private static boolean shouldParkAfterFailedAcquire(Node pred, Node node) {
int ws = pred.waitStatus;
if (ws == Node.SIGNAL)
/*
* This node has already set status asking a release
* to signal it, so it can safely park.
*/
return true;
if (ws > 0) {
/*
* Predecessor was cancelled. Skip over predecessors and
* indicate retry.
*/
do {
node.prev = pred = pred.prev;
} while (pred.waitStatus > 0);
pred.next = node;
} else {
/*
* waitStatus must be 0 or PROPAGATE. Indicate that we
* need a signal, but don't park yet. Caller will need to
* retry to make sure it cannot acquire before parking.
*/
compareAndSetWaitStatus(pred, ws, Node.SIGNAL); // 需要先SIGNAL再重新进入循环达到parkAndCheckInterrupt()避免竞争冒险
}
return false;
}
private final boolean parkAndCheckInterrupt() {
LockSupport.park(this);
return Thread.interrupted();
}
LockSupport的park中有参数blocker
public static void park(Object blocker) {
Thread t = Thread.currentThread();
setBlocker(t, blocker);
UNSAFE.park(false, 0L);
setBlocker(t, null);
}
作用是
This object is recorded while the thread is blocked to permit monitoring and diagnostic tools to identify the reasons that threads are blocked. (Such tools may access blockers using method getBlocker(Thread).)
这样就可以在jstack中看到哪些地方发生了死锁。
waitStatus:
* Status field, taking on only the values:
* SIGNAL: The successor of this node is (or will soon be)
* blocked (via park), so the current node must
* unpark its successor when it releases or
* cancels. To avoid races, acquire methods must
* first indicate they need a signal,
* then retry the atomic acquire, and then,
* on failure, block.
* CANCELLED: This node is cancelled due to timeout or interrupt.
* Nodes never leave this state. In particular,
* a thread with cancelled node never again blocks.
* CONDITION: This node is currently on a condition queue.
* It will not be used as a sync queue node
* until transferred, at which time the status
* will be set to 0. (Use of this value here has
* nothing to do with the other uses of the
* field, but simplifies mechanics.)
* PROPAGATE: A releaseShared should be propagated to other
* nodes. This is set (for head node only) in
* doReleaseShared to ensure propagation
* continues, even if other operations have
* since intervened.
* 0: None of the above
同步块:monitorenter、monitorexit指令
修饰方法:方法修饰符ACC_SYNCHRONIZED
最重要的是虚拟机中C++实现的ObjectMonitor对象,作为JVM管程实现同步的一部分。 Monitor(管程、监视器)是一种高级同步原语(与信号量等价),为了更容易编写正确的程序。 JDK Lock(AQS)也是一种管程的实现。
从上面的内容可以知道,synchronized与Lock最根本的不同在于: synchronized是Java语言提供的特性,Java语言只规定了synchronized语义必须符合这样的,底层JVM可以根据需要进行不同的实现以及做出各种优化; 而Lock,只是JDK中的lib,从上面也可以看到基本上是用的volatile+Node队列(同步队列,如果有condition的话还包括nextWaiter变量维护的条件队列)+CAS(native)+自旋+LockSupport.park()(及unpark,底层是native)来实现的。
对比JVM实现与Lock实现:
- | synchronized | Lock |
---|---|---|
中断 | 不支持 | 支持 |
超时 | 不支持 | 支持 |
tryLock | 不支持 | 支持 |
公平 | 非公平 | both |
可重入 | 支持 | 支持 |
condition | 一个 | 多个 |
见 https://liuweiqiang.me/2021/11/17/java-thread-source.html
应用:
见 https://liuweiqiang.me/2020/09/08/qs&tree.html
在请求的生命周期内进行context的访问时用到、Spring @Transactional也用来保存事务的上下文。
https://docs.oracle.com/en/java/javase/21/core/virtual-threads.html
JVM与Java语言其实没有太大的关系,但它们都和class字节码有着重要的联系。
JVM规范描述的是一种抽象化的虚拟机的行为,而不是任何一种广泛使用的虚拟机实现。 JVM实现可以自由地决定不在规范中描述的细节,如运行时的数据区如何布局,选用哪种垃圾收集算法,优化等。 详见《Java虚拟机规范(Java SE 8版)》
即运行时会用到的数据的区域,其中一些区域与虚拟机进程的生命周期绑定,另外一些与线程的绑定。
运行时数据区是一种概念模型,具体的Java虚拟机并不一定要完全照着概念模型的定义来进行设计,可能会通过一些更高效率的等价方式去实现它。
字符串常量池在哪个区域值得商榷,本人自己测试String.valueOf(i++).intern()同时调节MaxMetaspaceSize是不会有MetaSpace的OOM的(Zulu OpenJDK)。
以下方法也没看出intern时哪个区域的使用大小有变化(jconsole)
public static void main(String[] args) throws Exception {
int size = 11116384;
Set<String> set = new HashSet<>(size + 1);
int count = 0;
String zero = String.valueOf(size - 1);
set.add(zero);
String copy = String.valueOf(size - 1);
if (zero != copy.intern() && copy == zero.intern()) {
System.out.println("aaa");
}
for (int i = size; i < 2 * size; i++) {
set.add(String.valueOf(i));
}
for (String s: set) {
if (s == s.intern()) {
count++;
}
}
System.out.println(set.size());
System.out.println(count);
}
可以通过jstat -gccapacity
PS:这里设置的各区大小并不是启动时跟操作系统申请的大小。
jmap -dump:
获取HeapDump文件后可通过jhat工具的histo图、rootset等方式分析
jhat -port 1099 dump.test
在打开资源时,良好的习惯是使用try-with-resources或finally释放。
栈问题一般都比较容易排查。
元空间(方法区)在OOM时也会提示MetaSpace的OOM,通常是因为:
运行时还可以通过HotSpot参数NativeMemoryTracking(生产一般不用)来先排查是哪部分区域有问题。
部分参数可以通过jinfo运行时修改(manageable参数)。
垃圾收集与内存分配有着密切的关系,在选择用哪种分配和回收算法时通常需要综合考虑。
因为Java的内存管理系统是自动的,解放了Java应用开发人员。
在分配内存的时候,如遇容量不足(不同区域的不足会进行不同程度的回收),虚拟机首先需要暂停所有的用户线程(Stop-The-World)枚举出根节点,然后从GC Roots开始分析对象的可达性,以便进一步对不可达的对象进行回收。 期间会穿插用户线程的并发,Stop-The-World用来保证被标记的对象不会再被引用。 有两种方式降低这种停顿:
典型的GC Roots包括:
对于分代回收等算法,因为只回收一部分区域,其它区域可能会引用,所以需要把这些关联的【粒度适中的】区域也加到GC Roots。
较为经典的有:
这里涉及一些 https://liuweiqiang.me/2020/09/08/qs&tree.html
一般需要综合停顿时间、吞吐量、多处理器利用,一般交易服务器使用ParNew+CMS实现低停顿时间。
触发
类加载有三个阶段:
jvms(Java虚拟机规范)规定了两种类加载器:
在通常的虚拟机实现(JDK 8)中,是按双亲委派的模型实现的:
如果一个类加载器收到了类加载的请求,它不会自己去尝试加载这个类, 而是把这个请求委派给父加载器去完成,每一个层次的类加载器都是如此, 因此所有的加载请求最终都应该传送到最顶层的启动类加载器中,只有当 父加载器反馈自己无法完成这个加载请求(它的搜索范围中没有找到所需 的类)时,子加载器才会尝试自己去完成加载。
这里的父子指的是委派关系上的父子,而不是继承关系上的父子。
使用双亲委派的好处是
需要具体问题具体分析。
方案 | 优势 | 劣势 |
---|---|---|
System.currentTimeMillis() | 简单 | 代码侵入且繁琐 |
StopWatch | 简单且更加方便 | 代码侵入且繁琐 |
AOP | 较少侵入 | 私有和静态方法无法拦截 |
Java Agent、Attach API(Arthas trace) | 无侵入 | 开发门槛较高,需要注意多Agent之间的执行顺序、冲突 |
在开发阶段最常用手段,线上一般不允许使用。
如果远程服务支持,可以远程Debug(或者Arthas watch)。
包括应用日志、中间件日志,测试阶段常用,线上可能会屏蔽低级别日志。
应用问题排查要求对应用日志打印得较全。个人经验是打印方法所有的输入,如入参、DAO的返回等。
有时会漏打,通过JVM的一些API(Java Agent、Attach API)可以实现一些运行时增强,如BTrace、Arthas retransform。
在遇到Java应用死锁、阻塞时常用。
ps -emT | more # 查看进程与线程
ps -eT Huk -pcpu | more # 按CPU占用排序查看线程
查看进程与线程CPU占用,jstack
还可以通过专业工具的火焰图分析。
分析OOM与内存泄漏时常用,见《Runtime Data Area》。
同样也是分析内存问题时常用,需JVM参数
获得GC日志后可以通过gceasy.io等分析工具分析。也可结合jstat。
新生代频繁GC,可以考虑加大新生代的大小。
RFC 793
TCP的分法,有四层:
OSI分了七层:
这个说法不一,比起实现可靠,或许捋清楚不可靠的原因更加值得关注,这方面在《自顶向下方法》这本书中描述得比较清楚:
从实现的角度上看,由于是面向连接的,握手双方必须交换初始的控制信息(ISN等),因此至少需要前两次握手。 而TCP又是可靠的,对于控制信息包(SYN)也是如此(因而SYN也消耗一个序号),所以ACK是必须的,因而通信双方至少个需要一个ACK。 由于服务方的SYN与ACK可以通过同一包返回,因此握手是三次。
在特殊情况下,握手可以是四次的,比如双方同时打开。
对于SYN的可靠传输,带来一个异常处理上的好处是,ACK接收方(连接发起方)可以识别出错误的包,当ACK所确认的序号,未记录在ACK接收方时,便是一个错误。
对于挥手,由于需要允许半关闭的情况,挥手的ACK由底层内核的TCP完成,而第二次FIN一般是由应用层根据应用逻辑发起关闭。
发起关闭的一方,即执行主动关闭,在包处理完成后,进入TIME_WAIT(2MSL)状态,这样在另一方FIN发送失败或者主动关闭方ACK发送失败时能够重发FIN及ACK。 同时,TIME_WAIT使得连接变得松弛,不会有旧的包进入新的连接造成错误。
被动关闭的一方,即执行被动关闭,半关闭时的状态是CLOSE_WAIT。
由 请求行(请求)/状态行(响应) + 头部 + 空行 + 消息体 组成
状态码:
请求方法:
- | GET | POST |
---|---|---|
缓存 | 可以 | 一般不支持 |
服务器状态 | 无变化 | 有变化 |
请求参数 | 在URL和HEAD,因而长度的编码有限制 | 在BODY |
多次请求 | 无副作用 | 有副作用 |
其它浏览器行为 | - | - |
见 https://liuweiqiang.me/2021/12/23/when-implements-ssl-client.html
根IP是Hard coded在操作系统中的。
这里也有一些说明 https://liuweiqiang.me/2022/08/29/ddd-&-hibernate.html
常用的框架及中间件(可以参考Spring Cloud):
如果Redis客户端是Redisson,建议直接用封装好的。
基本思路是通过唯一约束来实现,包括Redis的key、RDBMS的唯一索引。 加锁时插入,解锁时删除。
使用Redis实现时,一般需要利用lua实现,ARGV用作token,防止被错误释放:
if redis.call("get",KEYS[1]) == ARGV[1] then
return redis.call("del",KEYS[1])
else
return 0
end
Redis官方介绍了高可靠的算法:Redlock(高可用)
由于通常的分布式锁并无fencing token这样的对资源保护,这样的分布式锁只能用来降低对资源的竞争而不能避免。
比较成熟的方案是通过限流或消息队列来缓解。
https://liuweiqiang.me/2019/01/28/database-note.html & https://liuweiqiang.me/2022/12/28/consistency.html
意向锁用来高效地实现表锁(包括显式的和隐式的)与行锁的冲突检测,申请行锁时需要先申请意向锁。它们的相容性:
- | (表级)X | IX | (表级)S | IS |
---|---|---|---|---|
(表级)X | Conflict | Conflict | Conflict | Conflict |
IX | Conflict | Compatible | Conflict | Compatible |
(表级)S | Conflict | Conflict | Compatible | Compatible |
IS | Conflict | Compatible | Compatible | Compatible |
用来实现插入和间隙锁的冲突检测。
主要是用B+树结构实现(包括主键) https://liuweiqiang.me/2020/09/08/qs&tree.html
InnoDB的索引:
Index Class | Index Type |
---|---|
Primary key | BTREE |
Unique | BTREE |
Key | BTREE |
FULLTEXT | N/A |
SPATIAL | N/A |
HASH索引只出现在MEMORY和NDB存储引擎。
又称为主索引,指记录的顺序与这个索引的顺序相同。 在MySQL(InnoDB)中,主键是聚簇的,同时也是B+树,这样子的话搜索到相应的主键值后,可以得到记录的全部内容。 而对于非聚簇索引,在搜索到相应的键值后,得到的value是主键值,需要"回表"。
- When you define a PRIMARY KEY on a table, InnoDB uses it as the clustered index. A primary key should be defined for each table. If there is no logical unique and non-null column or set of columns to use a the primary key, add an auto-increment column. Auto-increment column values are unique and are added automatically as new rows are inserted.
- If you do not define a PRIMARY KEY for a table, InnoDB uses the first UNIQUE index with all key columns defined as NOT NULL as the clustered index.
- If a table has no PRIMARY KEY or suitable UNIQUE index, InnoDB generates a hidden clustered index named GEN_CLUST_INDEX on a synthetic column that contains row ID values. The rows are ordered by the row ID that InnoDB assigns. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in order of insertion.
- | MySQL | Oracle |
---|---|---|
默认端口 | 3306 | 1521 |
隔离级别 | 通常意义上的4种 | 只有2(Read Committed、Serializable)+1(Read Only)种 |
对于MySQL的分析,其中一种是通过自带的慢查询日志(slow_query_log)来查看记录的慢查询。 The server uses the controlling parameters in the following order to determine whether to write a query to the slow query log:
这里的long_query_time是指实际的执行时间(=总耗时-锁阻塞时间)。
除了慢查询日志,还有监控系统可以实现可观测(OpenTelemetry)。
查看执行计划。 有许多方式能改进语句的性能,如修改某些MySQL变量等。但作为开发人员经常使用的是优化DML语句和表结构。
包括但不限于
MySQL还提供了一些方法来影响执行计划,如:
Without ICP, the storage engine traverses the index to locate rows in the base table and returns them to the MySQL server which evaluates the WHERE condition for the rows. With ICP enabled, and if parts of the WHERE condition can be evaluated by using only columns from the index, the MySQL server pushes this part of the WHERE condition down to the storage engine. The storage engine then evaluates the pushed index condition by using the index entry and only if this is satisfied is the row read from the table.
EXPLAIN output shows Using index condition in the Extra column when Index Condition Pushdown is used.
Index Condition Pushdown is enabled by default.
在使用Explain优化时会看到Extra中有Using temporary,是执行过程中创建的临时数据,在以下情况下可能会出现:
- Evaluation of UNION statements, with some exceptions described later.
- Evaluation of some views, such those that use the TEMPTABLE algorithm, UNION, or aggregation.
- Evaluation of derived tables.
- Tables created for subquery or semijoin materialization.
- Evaluation of statements that contain an ORDER BY clause and a different GROUP BY clause, or for which the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue.
- Evaluation of DISTINCT combined with ORDER BY may require a temporary table.
- For queries that use the SQL_SMALL_RESULT modifier, MySQL uses an in-memory temporary table, unless the query also contains elements (described later) that require on-disk storage.
- To evaluate INSERT ... SELECT statements that select from and insert into the same table, MySQL creates an internal temporary table to hold the rows from the SELECT, then inserts those rows into the target table.
- Evaluation of multiple-table UPDATE statements.
- Evaluation of GROUP_CONCAT() or COUNT(DISTINCT) expressions.
These conditions qualify a UNION for evaluation without a temporary table:
- The union is UNION ALL, not UNION or UNION DISTINCT.
- There is no global ORDER BY clause.
- The union is not the top-level query block of an {INSERT | REPLACE} ... SELECT ... statement.
同时,临时表的存储引擎有:
- | drop | truncate | delete |
---|---|---|---|
类型 | DDL | DDL | DML |
回滚 | 不可以 | 不可以 | 可以 |
元数据 | 删除 | 重置 | 不变 |
速度 | 快 | 较快|较慢 | |
触发器 | 不触发 | 不触发 | 触发 |
应考虑的问题(最好与DBA沟通):
此外还有BitMap(Java中的BitSet):
When key does not exist, a new string value is created. The string is grown to make sure it can hold a bit at offset. The offset argument is required to be greater than or equal to 0, and smaller than 2^32 (this limits bitmaps to 512MB). When the string at key is grown, added bits are set to 0.
HyperLogLog(提供不精确的去重计数)。
Redis objects can be encoded in different ways:
Strings can be encoded as:
raw, normal string encoding.
int, strings representing integers in a 64-bit signed interval, encoded in this way to save space.
embstr, an embedded string, which is an object where the internal simple dynamic string, sds, is an unmodifiable string allocated in the same chuck as the object itself. embstr can be strings with lengths up to the hardcoded limit of OBJ_ENCODING_EMBSTR_SIZE_LIMIT or 44 bytes.
Lists can be encoded as ziplist or linkedlist. The ziplist is the special representation that is used to save space for small lists.
Sets can be encoded as intset or hashtable. The intset is a special encoding used for small sets composed solely of integers.
Hashes can be encoded as ziplist or hashtable. The ziplist is a special encoding used for small hashes.
Sorted Sets can be encoded as ziplist or skiplist format. As for the List type small sorted sets can be specially encoded using ziplist, while the skiplist encoding is the one that works with sorted sets of any size.
All the specially encoded types are automatically converted to the general type once you perform an operation that makes it impossible for Redis to retain the space saving encoding.
得益于内存的利用,Redis使用单线程即可高效地执行命令:
Redis is, mostly, a single-threaded server from the POV of commands execution.
为了获得更高的性能,除了IO多路复用,Redis 6增加了IO线程来处理IO,命令执行保持使用原来的主线程处理。
单机 可用性低
主从 从节点发送PSYNC,主节点发送RDB及增量,不能自动故障转移
PSYNC replicationid offset
replicationid可以当作任期,不过是伪随机生成的。
When replicas connect to masters, they use the PSYNC command to send their old master replication ID and the offsets they processed so far.
This way the master can send just the incremental part needed.
However if there is not enough backlog in the master buffers, or if the replica is referring to an history (replication ID) which is no longer known, then a full resynchronization happens.
It is not possible to partially sync a replica that restarted via the AOF file.
However the instance may be turned to RDB persistence before shutting down it, than can be restarted, and finally AOF can be enabled again.
哨兵 并发依旧不高,容量依旧不大,机器利用率低
集群 3主3从
CLUSTER NODES
查看集群节点信息
Every Redis Cluster node requires two open TCP connections: a Redis TCP port used to serve clients, e.g., 6379, and second port known as the cluster bus port. By default, the cluster bus port is set by adding 10000 to the data port (e.g., 16379); however, you can override this in the cluster-port configuration.
Cluster bus is a node-to-node communication channel that uses a binary protocol, which is more suited to exchanging information between nodes due to little bandwidth and processing time. Nodes use the cluster bus for failure detection, configuration updates, failover authorization, and so forth. Clients should never try to communicate with the cluster bus port, but rather use the Redis command port. However, make sure you open both ports in your firewall, otherwise Redis cluster nodes won't be able to communicate.
由于分了三个主节点,每个主节点负责一部分数据,因此同样涉及分流(路由)、再平衡、事务的问题。
既然用了Redis,关系型数据库的事务特性几乎无法使用了。
Redis再平衡的做法是: 固定分16384个分区,如果集群中添加了一个新节点,该新节点可以从每个现有的节点上匀走几个分区。如果从集群中删除节点,则采取相反的均衡措施。 选中的整个分区会在节点之间迁移,但分区的总数量仍维持不变,也不会改变关键字到分区的映射关系(CRC16后对16383取模)。这里唯一要调整的是分区与节点的对应关系。 考虑到节点间通过网络传输数据总是需要些时间,这样调整可以逐步完成,在此期间,旧的分区仍然可以接收读写请求。
Redis服务端只支持有限的路由服务:
点击量计数、排行榜、分布式缓存、分布式锁、集群共享会话等。
在应用于缓存时,需要考虑以下情况不适用:
对于应用配置、业务参数等更新不频繁而又需要较强的一致性(主要是原子性)时,可以考虑增加版本号,将数据转换为不可变数据,继而使用缓存。
达到maxmemory时的驱逐策略:
Redis reclaims expired keys in two ways: upon access when those keys are found to be expired, and also in background, in what is called the "active expire key".
如果需要非常强的一致性,只能通过共识算法解决。
退而求其次,用小概率不一致的做法:
Redis的命令执行模型是单线程的,但在应用组合多命令时没有原子性,这时可以通过lua来完成。
一般应用于:
Producer 消息生产者
Consumer 消息消费者
Broker 消息代理,接收生产者发送的消息,发送给消费者,存在主从(还可以多主):
Broker刷盘:
NameServer 为生产者和消费者提供服务发现,无状态
此外,需要注意的是事务消息的生产组名称ProducerGroupName不能随意设置。事务消息有回查机制,回查时Broker端如果发现原始生产者已经崩溃,则会联系同一生产者组的其他生产者实例回查本地事务执行情况以Commit或Rollback半事务消息。
在断网或者是生产者应用重启的特殊情况下,若服务端未收到发送者提交的二次确认结果,或服务端收到的二次确认结果为Unknown未知状态,经过固定时间后,服务端将对消息生产者即生产者集群中任一生产者实例发起消息回查。
没有实现共识,理论上是不构成原子提交的,但也是业界比较常用的做法了。
事务消息的内部主题是RMQ_SYS_TRANS_HALF_TOPIC。死信队列,达到最大消费重试次数的消息,对应的队列名称为 %DLQ%ConsumerGroupName 。
基本思路是让消费速度大于生产速度:
如果是临时的:
RocketMQ所有主题所有分区使用相同的commit log文件,而Kafka不同的主题不同分区分了不同的文件夹(https://liuweiqiang.me/2022/03/16/kafka-in-action.html), 因而RocketMQ在主题数、分区数增加时性能影响较小。
服务提供者将自身信息注册到注册中心(ZK的临时节点,可通过ZK自带的命令行工具zkCli查看/dubbo子节点); 服务消费者运行时到注册中心取服务提供者信息,并缓存本地,因此注册中心挂掉后不影响运行中的客户端。
使用ZooKeeper作为注册中心时,当ZK Leader故障需要经历一段时间才能提供服务; 而其它注册中心(如Eureka)则无需选举新的Leader,因而可用性较高。
默认的RPC协议为Dubbo,使用netty服务端和hessian2序列化(二进制)。 https://cn.dubbo.apache.org/zh-cn/overview/mannual/java-sdk/reference-manual/protocol/dubbo/
Dubbo 缺省协议采用单一长连接和 NIO 异步通讯,适合于小数据量大并发的服务调用,以及服务消费者机器数远大于服务提供者机器数的情况。
其它协议还有gRPC、RMI等。
PS:netty和jetty没有关系,jetty是servlet容器,对标tomcat,netty利用了Java NIO,实现(asynchronous event-driven network application framework)。
Dubbo | Feign |
---|---|
RPC风格,耦合较高 | RESTful |
还包括了负载均衡、监控等 | 需要其它组件配合 |
倾向于自定义协议,更容易获得高性能 | HTTP |
简单来说就是调用方发布接口。
Java的SPI主要类是java.util.ServiceLoader,主要是读取jar中META-INF/services/下的元数据实现注册。 java.sql.Driver使用了Java SPI。
由于Java SPI这个方法
public static <S> ServiceLoader<S> load(Class<S> service) {
ClassLoader cl = Thread.currentThread().getContextClassLoader();
return ServiceLoader.load(service, cl);
}
使用了contextClassLoader,可能会破坏双亲委派。
Dubbo SPI与Java SPI类似,使用META-INF/dubbo/下的元数据。
控制反转将对象创建的控制反转了,由IoC容器控制对象的创建。
依赖注入(DI)是实现控制反转的一种方式。
- | Autowired | Resource |
---|---|---|
包 | org.springframework.beans.factory.annotation.Autowired | javax.annotation.Resource |
多Bean处理 | 需要@Primary或@Qualifier配合(即byType策略) | 可以用自身的name属性(即byName) |
Spring Boot的SPI机制,通过扫描ClassLoader中Jar的META-INF/spring.factories元数据【org.springframework.boot.autoconfigure.EnableAutoConfiguration=xxx(@Configuration)】实现。
Spring Cloud有个bootstrap.yml配置,优先级如下(见演示项目 https://github.com/lvv9/spring-cloud-config):
主要解决思路是提前暴露对象,因此如果相互依赖的Bean都是通过构造器注入时就无法解决,且目前只支持单例作用域的。
二三级缓存用于解决代理带来的问题,暴露早期代理对象,同时保证早期暴露的对象与最终的对象(一级缓存里的)一致(因为还存在BeanPostProcessor此类后置处理器):Spring中的循环依赖及解决
如果早期暴露的对象与最终的对象不一致,则会抛:
org.springframework.beans.factory.BeanCurrentlyInCreationException: Error creating bean with name 'serviceA': Bean with name 'serviceA' has been injected into other beans [serviceB] in its raw version as part of a circular reference, but has eventually been wrapped. This means that said other beans do not use the final version of the bean. This is often the result of over-eager type matching - consider using 'getBeanNamesForType' with the 'allowEagerInit' flag turned off, for example.
Zookeeper的数据模型是由树形的多个znode节点组成的。节点的类型包括从两个维度组合包括:
节点可以注册watcher,以利用ZK的通知机制。
https://zookeeper.apache.org/doc/r3.8.1/zookeeperInternals.html
https://semver.org/lang/zh-CN/
版本格式:主版本号.次版本号.修订号,版本号递增规则如下:
可维护性在DDIA中包括: