Interpreting Infinispan / JGroups logs
The Connect2id server utilises Infinispan / JGroups for in-memory storage and caching of sessions, client registrations and other OpenID Connect / OAuth data.
In this document you can find example log messages generated by Infinispan / JGroups when the cluster is created, and when nodes get added or removed. You can use these examples to know what to look for when monitoring your Connect2id server cluster or debugging networking issues. If you need further help get in touch with our support. You can also post comments below.
Note that Infinispan / JGroups can also be monitored via the metrics
endpoint (see
infinispan.numClusterMembers and infinipsan.isCoordinator) and also via
JMX.
1. First node startup, cluster creation
The first Connect2id server node to startup will automatically become the cluster coordinator. This coordinator's role is not statically tied to host, and will be automatically be taken over by the next available server should the coordinator go offline.
On startup the Connect2id server queries the local Infinispan cache manager
and logs key parameters into logs/c2id-server.log
, such as status, cluster
name, local addresses and JGroups configuration file:
2015-10-29T10:54:26,786 INFO localhost-startStop-1 MAIN - [CM8007] Infinispan status: RUNNING
2015-10-29T10:54:26,786 INFO localhost-startStop-1 MAIN - [CM8009] Infinispan cluster name: ISPN
2015-10-29T10:54:26,786 INFO localhost-startStop-1 MAIN - [CM8010] Infinispan cluster local node logical address: c2id-node-1-18639
2015-10-29T10:54:26,787 INFO localhost-startStop-1 MAIN - [CM8019] Infinispan cluster local node physical address(es): [fe80:0:0:0:a9e:1ff:fe35:c615%2:36745]
2015-10-29T10:54:26,787 INFO localhost-startStop-1 MAIN - [CM8011] Infinispan cluster coordinator logical address: c2id-node-1-18639
2015-10-29T10:54:26,787 INFO localhost-startStop-1 MAIN - [CM8016] Infinispan cluster local node is coordinator: true
2015-10-29T10:54:26,787 INFO localhost-startStop-1 MAIN - [CM8012] Infinispan cluster members: [c2id-node-1-18639]
2015-10-29T10:54:26,788 INFO localhost-startStop-1 MAIN - [CM8013] Infinispan cluster distributed sync timeout: 240000
2015-10-29T10:54:26,788 INFO localhost-startStop-1 MAIN - [CM8014] Infinispan cluster JGroups configuration file: default-configs/default-jgroups-udp.xml
The startup of the individual Infinispan caches / maps is also recorded:
2015-10-29T10:54:26,788 INFO localhost-startStop-1 MAIN - [CM8006] Started Infinispan cache authzStore.codeMap in 16857 ms
2015-10-29T10:54:26,804 INFO localhost-startStop-1 MAIN - [CM8006] Started Infinispan cache authzStore.accessTokenMap in 16873 ms
2015-10-29T10:54:26,822 INFO localhost-startStop-1 MAIN - [CM8006] Started Infinispan cache authzStore.authzCache in 16891 ms
...
The first Connect2id server node to start up will establish a new cluster and become its coordinator. This log sample is for a JGroups stack based on IP / UDP multicast for the message transport, PING node discovery and the FD_SOCK and FD_ALL protocols for failure detection.
2015-10-29T10:54:10,921 DEBUG localhost-startStop-1 Configurator - set property UDP.bind_addr to default value /fe80:0:0:0:a9e:1ff:fe35:c615%2
2015-10-29T10:54:10,922 DEBUG localhost-startStop-1 Configurator - set property UDP.diagnostics_addr to default value /ff0e:0:0:0:0:0:75:75
2015-10-29T10:54:11,005 DEBUG localhost-startStop-1 UDP - sockets will use interface fe80:0:0:0:a9e:1ff:fe35:c615%2
2015-10-29T10:54:11,017 DEBUG localhost-startStop-1 UDP - socket information:
mcast_addr=ff0e:0:0:0:0:0:e406:708:46655, bind_addr=/fe80:0:0:0:a9e:1ff:fe35:c615%2, ttl=2
sock: bound to fe80:0:0:0:a9e:1ff:fe35:c615%2:36745, receive buffer size=212992, send buffer size=212992
mcast_sock: bound to fe80:0:0:0:a9e:1ff:fe35:c615%2:46655, send buffer size=212992, receive buffer size=212992
2015-10-29T10:54:11,026 DEBUG localhost-startStop-1 GMS - address=c2id-node-1-18639, cluster=ISPN, physical address=fe80:0:0:0:a9e:1ff:fe35:c615%2:36745
2015-10-29T10:54:26,041 DEBUG localhost-startStop-1 NAKACK2 - [c2id-node-1-18639 setDigest()]
existing digest: []
new digest: c2id-node-1-18639: [0 (0)]
resulting digest: c2id-node-1-18639: [0 (0)]
2015-10-29T10:54:26,042 DEBUG localhost-startStop-1 GMS - c2id-node-1-18639: installing view [c2id-node-1-18639|0] (1) [c2id-node-1-18639]
2015-10-29T10:54:26,044 DEBUG localhost-startStop-1 STABLE - resuming message garbage collection
2015-10-29T10:54:26,045 DEBUG localhost-startStop-1 FD_SOCK - c2id-node-1-18639: VIEW_CHANGE received: [c2id-node-1-18639]
2015-10-29T10:54:26,083 DEBUG localhost-startStop-1 STABLE - resuming message garbage collection
2015-10-29T10:54:26,084 DEBUG localhost-startStop-1 GMS - c2id-node-1-18639: created cluster (first member). My view is [c2id-node-1-18639|0], impl is org.jgroups.protocols.pbcast.CoordGmsImpl
...
2. Second node joining
When a second node is started, it will automatically broadcast a ping to find other pre-existing nodes under the same cluster name, and will join them.
The second node (c2id-node-2) will log this as follows:
2015-10-29T10:56:13,373 DEBUG localhost-startStop-1 UDP - socket information:
mcast_addr=ff0e:0:0:0:0:0:e406:708:46655,
bind_addr=/fe80:0:0:0:62eb:69ff:fe63:5fe2%2, ttl=2
sock: bound to fe80:0:0:0:62eb:69ff:fe63:5fe2%2:60851, receive buffer size=212992, send buffer size=212992
mcast_sock: bound to fe80:0:0:0:62eb:69ff:fe63:5fe2%2:46655, send buffer size=212992, receive buffer size=212992
2015-10-29T10:56:13,380 DEBUG localhost-startStop-1 GMS - address=c2id-node-2-61644, cluster=ISPN, physical address=fe80:0:0:0:62eb:69ff:fe63:5fe2%2:60851
2015-10-29T10:56:13,409 DEBUG localhost-startStop-1 GMS - c2id-node-2-61644: sending JOIN(c2id-node-2-61644) to c2id-node-1-18639
2015-10-29T10:56:13,521 DEBUG localhost-startStop-1 NAKACK2 - [c2id-node-2-61644 setDigest()]
existing digest: []
new digest: c2id-node-1-18639: [0 (0)], c2id-node-2-61644: [0 (0)]
resulting digest: c2id-node-1-18639: [0 (0)], c2id-node-2-61644: [0 (0)]
2015-10-29T10:56:13,522 DEBUG localhost-startStop-1 GMS - c2id-node-2-61644: installing view [c2id-node-1-18639|1] (2) [c2id-node-1-18639, c2id-node-2-61644]
2015-10-29T10:56:13,524 DEBUG localhost-startStop-1 FD_SOCK - c2id-node-2-61644: VIEW_CHANGE received: [c2id-node-1-18639, c2id-node-2-61644]
2015-10-29T10:56:13,532 DEBUG FD_SOCK pinger,c2id-node-2-61644 FD_SOCK - c2id-node-2-61644: ping_dest is c2id-node-1-18639, pingable_mbrs=[c2id-node-1-18639, c2id-node-2-61644]
The joining of the second node will also be recorded in the logs of the first node (c2id-node-1):
2015-10-29T10:56:13,477 DEBUG ViewHandler,c2id-node-1-18639 STABLE - suspending message garbage collection
2015-10-29T10:56:13,479 DEBUG ViewHandler,c2id-node-1-18639 STABLE - c2id-node-1-18639: resume task started, max_suspend_time=33000
2015-10-29T10:56:13,490 DEBUG Incoming-2,c2id-node-1-18639 GMS - c2id-node-1-18639: installing view [c2id-node-1-18639|1] (2) [c2id-node-1-18639, c2id-node-2-61644]
2015-10-29T10:56:13,492 DEBUG Incoming-2,c2id-node-1-18639 FD_SOCK - c2id-node-1-18639: VIEW_CHANGE received: [c2id-node-1-18639, c2id-node-2-61644]
2015-10-29T10:56:13,494 DEBUG FD_SOCK pinger,c2id-node-1-18639 FD_SOCK - c2id-node-1-18639: ping_dest is c2id-node-2-61644, pingable_mbrs=[c2id-node-1-18639, c2id-node-2-61644]
2015-10-29T10:56:13,586 DEBUG ViewHandler,c2id-node-1-18639 STABLE - resuming message garbage collection
3. Node graciously leaving cluster
When a Connect2id server node is shut down in a normal manner, this will
produce the following log messages (typically into logs/catalina.out
):
[CM8005] Stopping Infinispan...
[CM8015] Stopped Infinispan cache sessionStore.sessionMap in 15 ms
[CM8015] Stopped Infinispan cache op.regTokenCache in 25 ms
[CM8015] Stopped Infinispan cache authzStore.codeMap in 33 ms
[CM8015] Stopped Infinispan cache authzStore.authzCache in 42 ms
[CM8015] Stopped Infinispan cache op.regCache in 49 ms
[CM8015] Stopped Infinispan cache authzStore.accessTokenMap in 57 ms
[CM8015] Stopped Infinispan cache op.authSessionMap in 65 ms
[CM8015] Stopped Infinispan cache op.consentSessionMap in 75 ms
[CM8015] Stopped Infinispan cache sessionStore.subjectMap in 83 ms
29-Oct-2015 11:59:08.121 INFO [http-nio-8080-exec-5] org.infinispan.remoting.transport.jgroups.JGroupsTransport.stop ISPN000080: Disconnecting JGroups channel ISPN
29-Oct-2015 11:59:08.409 INFO [http-nio-8080-exec-5] org.infinispan.remoting.transport.jgroups.JGroupsTransport.stop ISPN000082: Stopping the RpcDispatcher for channel ISPN
And as seen from the coordinator (c2id-node-1), which remains online:
2015-10-29T11:59:08,181 DEBUG FD_SOCK pinger,c2id-node-1-18639 FD_SOCK - c2id-node-1-18639: peer c2id-node-2-61644 closed socket gracefully
2015-10-29T11:59:08,181 DEBUG FD_SOCK pinger,c2id-node-1-18639 FD_SOCK - c2id-node-1-18639: ping_dest is null, pingable_mbrs=[c2id-node-1-18639]
2015-10-29T11:59:08,185 DEBUG Incoming-2,c2id-node-1-18639 GMS - c2id-node-1-18639: installing view [c2id-node-1-18639|2] (1) [c2id-node-1-18639]
2015-10-29T11:59:08,186 DEBUG Incoming-2,c2id-node-1-18639 NAKACK2 - c2id-node-1-18639: removed c2id-node-2-61644 from xmit_table (not member anymore)
2015-10-29T11:59:08,186 DEBUG Incoming-2,c2id-node-1-18639 FD_SOCK - c2id-node-1-18639: VIEW_CHANGE received: [c2id-node-1-18639]
2015-10-29T11:59:18,229 DEBUG Timer-2,c2id-node-1-18639 UNICAST3 - c2id-node-1-18639: removing expired connection for c2id-node-2-61644 (10005 ms old) from send_table
2015-10-29T11:59:18,230 DEBUG Timer-2,c2id-node-1-18639 UNICAST3 - c2id-node-1-18639: removing expired connection for c2id-node-2-61644 (10005 ms old) from recv_table
4. Network disconnect or node crash
The following example was logged when a Connect2id server node gets disconnected from the cluster network. A node crash will output similar messages.
The node that became disconnected (network outage) will assume a new cluster
view with itself as the only member (logs/catalina.out
):
29-Oct-2015 12:10:35.354 INFO [Incoming-2,c2id-node-2-54714] org.infinispan.remoting.transport.jgroups.JGroupsTransport.viewAccepted ISPN000094: Received new cluster view for channel ISPN: [c2id-node-2-54714|4] (1) [c2id-node-2-54714]
As seen from the coordinator (first node), in logs/c2id-server.log
:
2015-10-29T12:10:30,192 DEBUG Timer-2,c2id-node-1-18639 FD_ALL - haven't received a heartbeat from c2id-node-2-54714 for 60039 ms, adding it to suspect list
2015-10-29T12:10:30,193 DEBUG Timer-2,c2id-node-1-18639 FD_ALL - suspecting [c2id-node-2-54714]
2015-10-29T12:10:35,199 DEBUG Timer-2,c2id-node-1-18639 FD_ALL - haven't received a heartbeat from c2id-node-2-54714 for 65042 ms, adding it to suspect list
2015-10-29T12:10:35,200 DEBUG Timer-2,c2id-node-1-18639 FD_ALL - suspecting [c2id-node-2-54714]
2015-10-29T12:10:35,249 DEBUG Incoming-2,c2id-node-1-18639 GMS - c2id-node-1-18639: installing view [c2id-node-1-18639|4] (1) [c2id-node-1-18639]
2015-10-29T12:10:35,249 DEBUG Incoming-2,c2id-node-1-18639 NAKACK2 - c2id-node-1-18639: removed c2id-node-2-54714 from xmit_table (not member anymore)
2015-10-29T12:10:35,250 DEBUG Incoming-2,c2id-node-1-18639 FD_SOCK - c2id-node-1-18639: VIEW_CHANGE received: [c2id-node-1-18639]
2015-10-29T12:10:35,252 DEBUG FD_SOCK pinger,c2id-node-1-18639 FD_SOCK - c2id-node-1-18639: socket to null was closed gracefully
2015-10-29T12:10:35,252 DEBUG FD_SOCK pinger,c2id-node-1-18639 FD_SOCK - c2id-node-1-18639: ping_dest is null, pingable_mbrs=[c2id-node-1-18639]
2015-10-29T12:10:45,124 DEBUG Timer-3,c2id-node-1-18639 UNICAST3 - c2id-node-1-18639: removing expired connection for c2id-node-2-54714 (10006 ms old) from send_table
2015-10-29T12:10:45,124 DEBUG Timer-3,c2id-node-1-18639 UNICAST3 - c2id-node-1-18639: removing expired connection for c2id-node-2-54714 (10006 ms old) from recv_table
And in logs/catalina.out
:
29-Oct-2015 12:10:35.254 INFO [Incoming-2,c2id-node-1-18639] org.infinispan.remoting.transport.jgroups.JGroupsTransport.viewAccepted ISPN000094: Received new cluster view for channel ISPN: [c2id-node-1-18639|4] (1) [c2id-node-1-18639]
5. Split brain warning
A split brain warning occurs when a number of nodes disappear and the JGroups software has grounds to suspect that the cluster has become partitioned:
29-Oct-2015 12:10:35.256 WARN [transport-thread--p2-t13] infinispan.org.jboss.logging.JDKLogger.doLogf ISPN000314: Cache sessionStore.sessionMap lost at least half of the stable members, possible split brain causing data inconsistency. Current members are [c2id-node-1-18639], lost members are [c2id-node-2-54714], stable members are [c2id-node-1-18639, c2id-node-2-54714]
29-Oct-2015 12:10:35.259 WARN [transport-thread--p2-t13] <unknown>.<unknown> ISPN000314: Cache op.regTokenCache lost at least half of the stable members, possible split brain causing data inconsistency. Current members are [c2id-node-1-18639], lost members are [c2id-node-2-54714], stable members are [c2id-node-1-18639, c2id-node-2-54714]
29-Oct-2015 12:10:35.260 WARN [transport-thread--p2-t13] <unknown>.<unknown> ISPN000314: Cache authzStore.codeMap lost at least half of the stable members, possible split brain causing data inconsistency. Current members are [c2id-node-1-18639], lost members are [c2id-node-2-54714], stable members are [c2id-node-1-18639, c2id-node-2-54714]
29-Oct-2015 12:10:35.261 WARN [transport-thread--p2-t13] <unknown>.<unknown> ISPN000314: Cache authzStore.authzCache lost at least half of the stable members, possible split brain causing data inconsistency. Current members are [c2id-node-1-18639], lost members are [c2id-node-2-54714], stable members are [c2id-node-1-18639, c2id-node-2-54714]
29-Oct-2015 12:10:35.262 WARN [transport-thread--p2-t13] <unknown>.<unknown> ISPN000314: Cache op.regCache lost at least half of the stable members, possible split brain causing data inconsistency. Current members are [c2id-node-1-18639], lost members are [c2id-node-2-54714], stable members are [c2id-node-1-18639, c2id-node-2-54714]
29-Oct-2015 12:10:35.265 WARN [transport-thread--p2-t13] org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy.checkForLostData ISPN000314: Cache authzStore.accessTokenMap lost at least half of the stable members, possible split brain causing data inconsistency. Current members are [c2id-node-1-18639], lost members are [c2id-node-2-54714], stable members are [c2id-node-1-18639, c2id-node-2-54714]
29-Oct-2015 12:10:35.266 WARN [transport-thread--p2-t13] org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy.checkForLostData ISPN000314: Cache op.authSessionMap lost at least half of the stable members, possible split brain causing data inconsistency. Current members are [c2id-node-1-18639], lost members are [c2id-node-2-54714], stable members are [c2id-node-1-18639, c2id-node-2-54714]
29-Oct-2015 12:10:35.267 WARN [transport-thread--p2-t13] org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy.checkForLostData ISPN000314: Cache op.consentSessionMap lost at least half of the stable members, possible split brain causing data inconsistency. Current members are [c2id-node-1-18639], lost members are [c2id-node-2-54714], stable members are [c2id-node-1-18639, c2id-node-2-54714]
29-Oct-2015 12:10:35.270 WARN [transport-thread--p2-t13] org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy.checkForLostData ISPN000314: Cache sessionStore.subjectMap lost at least half of the stable members, possible split brain causing data inconsistency. Current members are [c2id-node-1-18639], lost members are [c2id-node-2-54714], stable members are [c2id-node-1-18639, c2id-node-2-54714]
6. Disconnected node rejoins cluster
If the disconnected node gets reconnected to the network, it will rejoin using the configured MERGE3 protocol:
As logged by the rejoining node (c2id-node-1), in c2id-server.log
:
2015-10-29T12:22:35,278 DEBUG INT-1,c2id-node-2-54714 STABLE - suspending message garbage collection
2015-10-29T12:22:35,279 DEBUG INT-1,c2id-node-2-54714 STABLE - c2id-node-2-54714: resume task started, max_suspend_time=220000
2015-10-29T12:22:35,292 DEBUG Incoming-3,c2id-node-2-54714 NAKACK2 - [c2id-node-2-54714 mergeDigest()]
existing digest: c2id-node-2-54714: [5 (5)]
new digest: c2id-node-1-18639: [10 (10)], c2id-node-2-54714: [4 (4)]
resulting digest: c2id-node-1-18639: [10 (10)], c2id-node-2-54714: [5 (5)]
2015-10-29T12:22:35,293 DEBUG Incoming-3,c2id-node-2-54714 GMS - c2id-node-2-54714: installing view MergeView::[c2id-node-1-18639|5] (2) [c2id-node-1-18639, c2id-node-2-54714], 2 subgroups: [c2id-node-2-54714|4] (1) [c2id-node-2-54714], [c2id-node-1-18639|4] (1) [c2id-node-1-18639]
2015-10-29T12:22:35,294 WARN Incoming-1,c2id-node-2-54714 NAKACK2 - JGRP000011: c2id-node-2-54714: dropped message 11 from non-member c2id-node-1-18639 (view=[c2id-node-2-54714|4] (1) [c2id-node-2-54714])
2015-10-29T12:22:35,294 DEBUG Incoming-3,c2id-node-2-54714 FD_SOCK - c2id-node-2-54714: VIEW_CHANGE received: [c2id-node-1-18639, c2id-node-2-54714] 2015-10-29T12:22:35,296 DEBUG FD_SOCK pinger,c2id-node-2-54714 FD_SOCK - c2id-node-2-54714: ping_dest is c2id-node-1-18639, pingable_mbrs=[c2id-node-1-18639, c2id-node-2-54714]
2015-10-29T12:22:35,301 DEBUG Incoming-3,c2id-node-2-54714 STABLE - resuming message garbage collection
... and in catalina.out
:
29-Oct-2015 12:22:35.297 INFO [Incoming-3,c2id-node-2-54714] org.infinispan.remoting.transport.jgroups.JGroupsTransport.viewAccepted ISPN000093: Received new, MERGED cluster view for channel ISPN: MergeView::[c2id-node-1-18639|5] (2) [c2id-node-1-18639, c2id-node-2-54714], 2 subgroups: [c2id-node-2-54714|4] (1) [c2id-node-2-54714], [c2id-node-1-18639|4] (1) [c2id-node-1-18639]
As logged by the coordinator (c2id-node-1), in c2id-server.log
:
2015-10-29T12:22:35,260 DEBUG Timer-2,c2id-node-1-18639 MERGE3 - I (c2id-node-1-18639) will be the merge leader
2015-10-29T12:22:35,266 DEBUG ViewHandler,c2id-node-1-18639 Merger - c2id-node-1-18639: I will be the leader. Starting the merge task for 2 coords
2015-10-29T12:22:35,273 DEBUG MergeTask,c2id-node-1-18639 STABLE - suspending message garbage collection
2015-10-29T12:22:35,273 DEBUG MergeTask,c2id-node-1-18639 STABLE - c2id-node-1-18639: resume task started, max_suspend_time=220000
2015-10-29T12:22:35,274 DEBUG MergeTask,c2id-node-1-18639 Merger - c2id-node-1-18639: merge task c2id-node-1-18639::1 started with 2 coords
2015-10-29T12:22:35,285 DEBUG MergeTask,c2id-node-1-18639 Merger - c2id-node-1-18639: installing merge view [c2id-node-1-18639|5] (2 members) in 2 coords
2015-10-29T12:22:35,286 DEBUG MergeTask,c2id-node-1-18639 Merger - c2id-node-1-18639: merge c2id-node-1-18639::1 took 18 ms
2015-10-29T12:22:35,288 DEBUG Incoming-1,c2id-node-1-18639 NAKACK2 - [c2id-node-1-18639 mergeDigest()]
existing digest: c2id-node-1-18639: [11 (11)]
new digest: c2id-node-1-18639: [10 (10)], c2id-node-2-54714: [4 (4)]
resulting digest: c2id-node-1-18639: [11 (11)], c2id-node-2-54714: [4 (4)]
2015-10-29T12:22:35,288 DEBUG Incoming-1,c2id-node-1-18639 GMS - c2id-node-1-18639: installing view MergeView::[c2id-node-1-18639|5] (2) [c2id-node-1-18639, c2id-node-2-54714], 2 subgroups: [c2id-node-2-54714|4] (1) [c2id-node-2-54714], [c2id-node-1-18639|4] (1) [c2id-node-1-18639]
2015-10-29T12:22:35,289 DEBUG Incoming-1,c2id-node-1-18639 FD_SOCK - c2id-node-1-18639: VIEW_CHANGE received: [c2id-node-1-18639, c2id-node-2-54714]
2015-10-29T12:22:35,289 DEBUG FD_SOCK pinger,c2id-node-1-18639 FD_SOCK - c2id-node-1-18639: ping_dest is c2id-node-2-54714, pingable_mbrs=[c2id-node-1-18639, c2id-node-2-54714]
2015-10-29T12:22:35,295 DEBUG Incoming-1,c2id-node-1-18639 STABLE - resuming message garbage collection
... and in catalina.out
:
29-Oct-2015 12:22:35.290 INFO [Incoming-1,c2id-node-1-18639] org.infinispan.remoting.transport.jgroups.JGroupsTransport.viewAccepted ISPN000093: Received new, MERGED cluster view for channel ISPN: MergeView::[c2id-node-1-18639|5] (2) [c2id-node-1-18639, c2id-node-2-54714], 2 subgroups: [c2id-node-2-54714|4] (1) [c2id-node-2-54714], [c2id-node-1-18639|4] (1) [c2id-node-1-18639]
29-Oct-2015 12:22:35.376 WARN [transport-thread--p2-t9] org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy.checkForLostData ISPN000314: Cache sessionStore.sessionMap lost at least half of the stable members, possible split brain causing data inconsistency. Current members are [c2id-node-1-18639], lost members are [c2id-node-2-54714], stable members are [c2id-node-1-18639, c2id-node-2-54714]
29-Oct-2015 12:22:35.376 WARN [transport-thread--p2-t10] org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy.checkForLostData ISPN000314: Cache op.consentSessionMap lost at least half of the stable members, possible split brain causing data inconsistency. Current members are [c2id-node-1-18639], lost members are [c2id-node-2-54714], stable members are [c2id-node-1-18639, c2id-node-2-54714]
29-Oct-2015 12:22:35.379 INFO [transport-thread--p2-t9] <unknown>.<unknown> ISPN000310: Starting cluster-wide rebalance for cache sessionStore.sessionMap, topology CacheTopology{id=10, rebalanceId=3, currentCH=ReplicatedConsistentHash{ns = 60, owners = (1)[c2id-node-1-18639: 60]}, pendingCH=ReplicatedConsistentHash{ns = 60, owners = (2)[c2id-node-1-18639: 30, c2id-node-2-54714: 30]}, unionCH=null, actualMembers=[c2id-node-1-18639, c2id-node-2-54714]}
29-Oct-2015 12:22:35.386 WARN [transport-thread--p2-t9] org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy.checkForLostData ISPN000314: Cache op.regCache lost at least half of the stable members, possible split brain causing data inconsistency. Current members are [c2id-node-1-18639], lost members are [c2id-node-2-54714], stable members are [c2id-node-1-18639, c2id-node-2-54714]
29-Oct-2015 12:22:35.387 INFO [transport-thread--p2-t9] org.infinispan.topology.ClusterTopologyManagerImpl.broadcastRebalanceStart ISPN000310: Starting cluster-wide rebalance for cache op.regCache, topology CacheTopology{id=10, rebalanceId=3, currentCH=ReplicatedConsistentHash{ns = 60, owners = (1)[c2id-node-1-18639: 60]}, pendingCH=ReplicatedConsistentHash{ns = 60, owners = (2)[c2id-node-1-18639: 30, c2id-node-2-54714: 30]}, unionCH=null, actualMembers=[c2id-node-1-18639, c2id-node-2-54714]}
29-Oct-2015 12:22:35.389 INFO [transport-thread--p2-t10] org.infinispan.topology.ClusterTopologyManagerImpl.broadcastRebalanceStart ISPN000310: Starting cluster-wide rebalance for cache op.consentSessionMap, topology CacheTopology{id=10, rebalanceId=3, currentCH=ReplicatedConsistentHash{ns = 60, owners = (1)[c2id-node-1-18639: 60]}, pendingCH=ReplicatedConsistentHash{ns = 60, owners = (2)[c2id-node-1-18639: 30, c2id-node-2-54714: 30]}, unionCH=null, actualMembers=[c2id-node-1-18639, c2id-node-2-54714]}
29-Oct-2015 12:22:35.392 WARN [transport-thread--p2-t10] org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy.checkForLostData ISPN000314: Cache op.authSessionMap lost at least half of the stable members, possible split brain causing data inconsistency. Current members are [c2id-node-1-18639], lost members are [c2id-node-2-54714], stable members are [c2id-node-1-18639, c2id-node-2-54714]
29-Oct-2015 12:22:35.395 WARN [transport-thread--p2-t9] org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy.checkForLostData ISPN000314: Cache op.regTokenCache lost at least half of the stable members, possible split brain causing data inconsistency. Current members are [c2id-node-1-18639], lost members are [c2id-node-2-54714], stable members are [c2id-node-1-18639, c2id-node-2-54714]
29-Oct-2015 12:22:35.396 INFO [transport-thread--p2-t10] org.infinispan.topology.ClusterTopologyManagerImpl.broadcastRebalanceStart ISPN000310: Starting cluster-wide rebalance for cache op.authSessionMap, topology CacheTopology{id=10, rebalanceId=3, currentCH=ReplicatedConsistentHash{ns = 60, owners = (1)[c2id-node-1-18639: 60]}, pendingCH=ReplicatedConsistentHash{ns = 60, owners = (2)[c2id-node-1-18639: 30, c2id-node-2-54714: 30]}, unionCH=null, actualMembers=[c2id-node-1-18639, c2id-node-2-54714]}
29-Oct-2015 12:22:35.399 INFO [transport-thread--p2-t9] org.infinispan.topology.ClusterTopologyManagerImpl.broadcastRebalanceStart ISPN000310: Starting cluster-wide rebalance for cache op.regTokenCache, topology CacheTopology{id=10, rebalanceId=3, currentCH=ReplicatedConsistentHash{ns = 60, owners = (1)[c2id-node-1-18639: 60]}, pendingCH=ReplicatedConsistentHash{ns = 60, owners = (2)[c2id-node-1-18639: 30, c2id-node-2-54714: 30]}, unionCH=null, actualMembers=[c2id-node-1-18639, c2id-node-2-54714]}
29-Oct-2015 12:22:35.401 WARN [transport-thread--p2-t10] org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy.checkForLostData ISPN000314: Cache authzStore.accessTokenMap lost at least half of the stable members, possible split brain causing data inconsistency. Current members are [c2id-node-1-18639], lost members are [c2id-node-2-54714], stable members are [c2id-node-1-18639, c2id-node-2-54714]
29-Oct-2015 12:22:35.402 INFO [transport-thread--p2-t10] org.infinispan.topology.ClusterTopologyManagerImpl.broadcastRebalanceStart ISPN000310: Starting cluster-wide rebalance for cache authzStore.accessTokenMap, topology CacheTopology{id=10, rebalanceId=3, currentCH=ReplicatedConsistentHash{ns = 60, owners = (1)[c2id-node-1-18639: 60]}, pendingCH=ReplicatedConsistentHash{ns = 60, owners = (2)[c2id-node-1-18639: 30, c2id-node-2-54714: 30]}, unionCH=null, actualMembers=[c2id-node-1-18639, c2id-node-2-54714]}
29-Oct-2015 12:22:35.405 WARN [transport-thread--p2-t9] org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy.checkForLostData ISPN000314: Cache authzStore.codeMap lost at least half of the stable members, possible split brain causing data inconsistency. Current members are [c2id-node-1-18639], lost members are [c2id-node-2-54714], stable members are [c2id-node-1-18639, c2id-node-2-54714]
29-Oct-2015 12:22:35.406 INFO [transport-thread--p2-t9] org.infinispan.topology.ClusterTopologyManagerImpl.broadcastRebalanceStart ISPN000310: Starting cluster-wide rebalance for cache authzStore.codeMap, topology CacheTopology{id=10, rebalanceId=3, currentCH=ReplicatedConsistentHash{ns = 60, owners = (1)[c2id-node-1-18639: 60]}, pendingCH=ReplicatedConsistentHash{ns = 60, owners = (2)[c2id-node-1-18639: 30, c2id-node-2-54714: 30]}, unionCH=null, actualMembers=[c2id-node-1-18639, c2id-node-2-54714]}
29-Oct-2015 12:22:35.410 WARN [transport-thread--p2-t10] org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy.checkForLostData ISPN000314: Cache authzStore.authzCache lost at least half of the stable members, possible split brain causing data inconsistency. Current members are [c2id-node-1-18639], lost members are [c2id-node-2-54714], stable members are [c2id-node-1-18639, c2id-node-2-54714]
29-Oct-2015 12:22:35.411 INFO [transport-thread--p2-t10] org.infinispan.topology.ClusterTopologyManagerImpl.broadcastRebalanceStart ISPN000310: Starting cluster-wide rebalance for cache authzStore.authzCache, topology CacheTopology{id=10, rebalanceId=3, currentCH=ReplicatedConsistentHash{ns = 60, owners = (1)[c2id-node-1-18639: 60]}, pendingCH=ReplicatedConsistentHash{ns = 60, owners = (2)[c2id-node-1-18639: 30, c2id-node-2-54714: 30]}, unionCH=null, actualMembers=[c2id-node-1-18639, c2id-node-2-54714]}
29-Oct-2015 12:22:35.414 WARN [transport-thread--p2-t9] org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy.checkForLostData ISPN000314: Cache sessionStore.subjectMap lost at least half of the stable members, possible split brain causing data inconsistency. Current members are [c2id-node-1-18639], lost members are [c2id-node-2-54714], stable members are [c2id-node-1-18639, c2id-node-2-54714]
29-Oct-2015 12:22:35.415 INFO [transport-thread--p2-t9] org.infinispan.topology.ClusterTopologyManagerImpl.broadcastRebalanceStart ISPN000310: Starting cluster-wide rebalance for cache sessionStore.subjectMap, topology CacheTopology{id=10, rebalanceId=3, currentCH=ReplicatedConsistentHash{ns = 60, owners = (1)[c2id-node-1-18639: 60]}, pendingCH=ReplicatedConsistentHash{ns = 60, owners = (2)[c2id-node-1-18639: 30, c2id-node-2-54714: 30]}, unionCH=null, actualMembers=[c2id-node-1-18639, c2id-node-2-54714]}
29-Oct-2015 12:22:35.454 INFO [remote-thread--p3-t11] org.infinispan.topology.ClusterCacheStatus.endRebalance ISPN000336: Finished cluster-wide rebalance for cache op.regCache, topology id = 10
29-Oct-2015 12:22:35.464 INFO [remote-thread--p3-t18] org.infinispan.topology.ClusterCacheStatus.endRebalance ISPN000336: Finished cluster-wide rebalance for cache authzStore.accessTokenMap, topology id = 10
29-Oct-2015 12:22:35.464 INFO [remote-thread--p3-t11] <unknown>.<unknown> ISPN000336: Finished cluster-wide rebalance for cache op.consentSessionMap, topology id = 10
29-Oct-2015 12:22:35.473 INFO [remote-thread--p3-t11] org.infinispan.topology.ClusterCacheStatus.endRebalance ISPN000336: Finished cluster-wide rebalance for cache authzStore.authzCache, topology id = 10
29-Oct-2015 12:22:35.478 INFO [remote-thread--p3-t11] org.infinispan.topology.ClusterCacheStatus.endRebalance ISPN000336: Finished cluster-wide rebalance for cache op.authSessionMap, topology id = 10
29-Oct-2015 12:22:35.479 INFO [remote-thread--p3-t25] org.infinispan.topology.ClusterCacheStatus.endRebalance ISPN000336: Finished cluster-wide rebalance for cache sessionStore.sessionMap, topology id = 10
29-Oct-2015 12:22:35.480 INFO [remote-thread--p3-t23] org.infinispan.topology.ClusterCacheStatus.endRebalance ISPN000336: Finished cluster-wide rebalance for cache op.regTokenCache, topology id = 10
29-Oct-2015 12:22:35.479 INFO [remote-thread--p3-t18] <unknown>.<unknown> ISPN000336: Finished cluster-wide rebalance for cache authzStore.codeMap, topology id = 10
29-Oct-2015 12:22:35.480 INFO [remote-thread--p3-t20] <unknown>.<unknown> ISPN000336: Finished cluster-wide rebalance for cache sessionStore.subjectMap, topology id = 10
7. Understanding log messages
7.1 Logical addresses
Each cluster node is identified by its Infinispan / JGroups logical address, which typically consists of the node hostname and an integer:
<hostname-int>
For example, with hostname c2id-node-1
:
c2id-node-1-18639
7.2 Views
Each node maintains a view of the cluster, which has the following format:
[<coordinator-address>|<view-sequence>] (node-count) [node-list...]
For example:
GMS - c2id-node-2-61644: installing view [c2id-node-1-18639|1] (2) [c2id-node-1-18639, c2id-node-2-61644]
which means:
- The current cluster coordinator is
c2id-node-18639
- The view has a sequence number of 1. Each time the view of a cluster changes this number is incremented. Using this number you can infer how many times the cluster members have changed since the cluster was created.
- The cluster consists of two nodes - (2).
- The cluster nodes are
c2id-node-1-18639
andc2id-node-2-61644
.