Apache Doris

Posted on Dec 18

Apache Doris IP change problem handling method

#bigdata #apachedoris #database #olap

Background note

Due to the existence of multiple network interface cards, or the existence of virtual network interface cards caused by the installation of Docker and other environments, there may be multiple different IPs on the same host. The current Apache Doris does not automatically recognize available IPs. Therefore, when encountering multiple IPs on the deployment host, you must force the correct IP through the priority_networks configuration item.

priority_networks is a configuration that both FE and BE have, and the configuration item needs to be written in fe.conf and be.conf. This configuration item is used to tell the process which IP to bind when FE or BE starts. An example is as follows:

$priority_networks =10.1.3.0/24$

This is a CIDR representation. FE or BE will use this configuration to find a matching IP as their localIP.

CIDR uses slash notation and is expressed as the number of bits of IP Address/Network ID. The specific conversion method can be seen in the following two examples.

① 192.168.0.0/16, converted to a 32-bit binary address: 11000000.10101000.0000000.00000000. Where/16 represents the 16-bit network ID, that is, the first 16 bits of the 32-bit binary address are fixed, corresponding to the network segment: 11000000.10101000.00000000.00000000~ 11000000.10101000.11111111.11111111.

② 192.168.1.2/24, converted to a 32-bit binary address: 11000000.10101000.00000001.00000000. Where/24 means that the first 24 bits of the 32-bit binary address are fixed, corresponding to the network segment: 11000000.10101000.00000001.00000000~ 11000000.10101000.00000001.1111111111

When the following scenario occurs, the ip will change, causing fe/be to malfunction and unable to start and operate normally

① cluster migration leads to ip network segment change

② IP change caused by dynamic address in virtual environment

③ If fe/be is not properly configured before restarting priority_networks the ip obtained after restarting is inconsistent with the metadata

1. Hardware information

CPU model: ARM64
Memory: 2GB
Hard drive: 36GB SSD

2. Software information

VM mirror version: CentOS-7
Apache Doris version: 1.2.4 (other versions are also acceptable)
Cluster size: 1FE * 3BE

FE recovery

3. Exception log

When checking fe.out, the following exception will be reported, and the fe process cannot be started at this time;

Before operation, pay attention to backup all fe metadata and stop upstream read and write actions!

4. Get the current IP

ip addr

5. 5. Reset IP information

After resetting the ip information, the above exception will still be reported, and the metadata needs to be reset .

# modify fe.conf priority_networks
priority_networks = 192.168.0.0/16
# or use this
priority_networks = 192.168.31.78/16

6. Reset metadata record

After resetting the metadata record, although the FE process can start, it is not available and requires metadata mode recovery.

# Annotate out the old ips previously recorded in the fe metadata
vim doris-meta/image/ROLE

7. Metadata mode recovery

# Add metadata_failure_recovery=true to fe.conf to restart fe in recovery mode
vim fe.conf
metadata_failure_recovery=true
# Then go to http://192.168.31.78:8030/login, if you can open the fe web UI, it can be normal boot fe

8. Reset fe cluster node

Although fe can currently be started using metadata Recovery Mode, it has not been fully restored because the cluster nodes recorded in the current fe metadata do not have the IP node that was just modified .

# Execute the following sql in the mysql client or web ui Playground to update the fe nodes recorded in the fe metadata
# remove old ip node
ALTER SYSTEM DROP FOLLOWER "192.168.31.81:9010";
# add new ip node 
ALTER SYSTEM ADD FOLLOWER "192.168.31.78:9010";

The old IP nodes are as follows.

The new IP node after reset is as follows.

9. Turn off metadata mode and restart FE

# Annotate metadata_failure_recovery=true in fe.conf Turn off recovery mode and restart fe
vim fe.conf
#metadata_failure_recovery=true

# and then go to http://192.168.31.78:8030/login, if you can open the fe web UI, fe completely restored

BE Recovery

10. Get the current IP

ip addr

11. Reset IP information

# modify be.conf priority_networks
priority_networks = 192.168.0.0/16
# or use this
priority_networks = 192.168.31.136/16
# After setting, restart be

12. Reset BE cluster node

Although the current be can be started, it has not been fully restored because the be cluster node recorded in the current fe metadata does not have the just modified be node .

# Execute the following sql in the mysql client or web ui Playground to update the be nodes recorded in the fe metadata
# remove old ip nodes
ALTER SYSTEM DROPP FOLLOWER "192.168.31.81:9010";
ALTER SYSTEM DROPP FOLLOWER "192.168.31.72:9010";
ALTER SYSTEM DROPP FOLLOWER "192.168.31.133:9010";
# add new ip nodes
ALTER SYSTEM ADD FOLLOWER "192.168.31.78:9010";
ALTER SYSTEM ADD FOLLOWER "192.168.31.71:9010";
ALTER SYSTEM ADD FOLLOWER "192.168.31.136:9010";

After all three BEs were reset, they were fully restored as follows.

At this point, Apache Doris cluster exception problem caused by IP change has been processed and restored

DEV Community