I thought it might be an idea to quickly blog the method we use when troubleshooting connectivity issues between two servers. This might seem like basic stuff, but it's rarely captured in one place, and almost never in the kind of checklist format that is useful for someone who lacks the experience to mentally formulate the checklist as they go.
For the purposes of this blog, let's imagine we're connecting a Linux web server to a database server running MySQL. It's a fairly classic scenario for a web developer working in almost any open source programming language, as MySQL is the "de facto" database back-end for Linux-based open source applications. However the principles can be applied to any TCP/IP connection debugging between applications on different servers.
I should note this is absolutely not for the total beginner - there is some assumed knowledge here, namely that you know enough MySQL to get a database set up and that you know how to SSH on to servers (and you have SSH access to the servers concerned). If that is the case, read on. If not, I suggest you find help for these first steps somewhere else, then come back.
So, you have a web server with your application copied to it, and you have a separate server for the database. You've created the database and a user for the database, entered those details into your application, but it still can't connect. Now what?
1. Am I using the correct addresses?
This might sound stupid, but it's easy to use the wrong address. One such scenario is you create two servers at AWS, you give them elastic IP addresses (EIPs), you note the address of the database server and try to connect using that address from the web server. However, the EIPs are blocked by default externally, and by using the EIPs instead of the internal IPs you are making your database traffic leave AWS and come back again, where it is likely blocked at the VPC (simplistically a firewall) because you turned it into external traffic. So those addresses are wrong. You should be using the internal IP addressing.
So the first thing to check is if you can ping the database server you want to reach, from the web server. In my case it's an internal IP address on a LAN that I'm using in my application config, rather than a fully qualified domain name, and that IP is 192.168.153.16. A good ping looks like this:
$ ping -c 4 192.168.153.16 PING 192.168.153.16 (192.168.153.16) 56(84) bytes of data. 64 bytes from 192.168.153.16: icmp_seq=1 ttl=51 time=45.8 ms 64 bytes from 192.168.153.16: icmp_seq=2 ttl=51 time=45.8 ms 64 bytes from 192.168.153.16: icmp_seq=3 ttl=51 time=45.6 ms 64 bytes from 192.168.153.16: icmp_seq=4 ttl=51 time=45.7 ms
- - 192.168.153.16 ping statistics - - 4 packets transmitted, 4 received, 0% packet loss, time 3004ms rtt min/avg/max/mdev = 45.650/45.782/45.854/0.081 ms
(It's always a good idea to use
-c to limit your ping, just in case something happens that causes you to get stuck in ping. It used to happen to us all the time with VMware's shonky shell console, if we forgot to put a
A bad ping will sit there and do nothing until it either times out or you hit Ctrl+C to kill it.
If you can't ping, something is fundamentally incorrect about the connection you're trying to make. You might want to try a
sudo ifconfig command to check the internal address of your database server and try pinging that, if you're using a fully qualified domain name instead of an IP address. If nothing works, you should seek support from a network engineer to understand why you do not have basic connectivity where you expect to.
(Worth noting that sometimes ping is blocked. If you know that is the case you can jump this step. There's a constant battle between network security professionals and network administrators as to whether ICMP ping should be permitted or not. Security professionals say it should be blocked. Network administrators say it's a vital tool, it's broadly harmless, it's part of the standard and blocking it is anti-social and unhelpful. At Code Enigma we are in the latter camp, clearly.)
2. Can I telnet?
Assuming we can ping, the next step is to check if the server allows us to connect to the expected port. Usually we'll be dealing with TCP/IP, which means usually we'll be able to try this with a simple
telnet connection. In this case our expected port is 3306, MySQL's default, so the command looks like this, from the web server:
$ telnet 192.168.153.16 3306 Trying 192.168.153.16... Connected to 192.168.153.16. Escape character is '^]'.
That's a successful connection. If you see this, the network is fine and your problem is not network related. Back to the drawing board!
Like ping, if there's a problem with telnet it will simply do nothing. It should connect almost immediately, if it doesn't there's definitely a connectivity issue. (Note, for UDP connections you need to do things differently.)
3. Is the service I want running on the port I'm expecting?
This is a classic. You can shortcut hours of staring at firewall configs and scratching your head by making sure the service you're after is running. In our case it's
mysqld so we would do something like this on the database server:
$ sudo ps aux | grep mysqld root 6838 0.0 0.0 12748 2192 pts/0 S+ 10:12 0:00 grep mysqld root 30299 0.0 0.0 4336 1624 ? S Jan30 0:00 /bin/sh /usr/bin/mysqld_safe mysql 30970 0.1 6.2 1062692 253120 ? Sl Jan30 16:56 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --log-error=/var/lib/mysql/log.err --open-files-limit=2000 --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --port=3306
In this case we're good. We can see MySQL is clearly running and right at the end there we can see port 3306, which is the standard MySQL port and what we're expecting. So yay.
If your service doesn't appear, start it (typically something like
sudo service mysql start or sometimes
/etc/init.d/mysqld start - depends on the system) and try to connect again. It could be as simple as that!
4. Is everything definitely OK on the database server?
netstat command is useful for checking the process list isn't lying to you. The command you want looks like this:
$ sudo netstat -tulpn | grep 3306 tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN 30970/mysqld
This is a good output. It shows me that something is indeed listening on all interfaces and port 3306, and it confirms the process is mysqld. Things to watch for include:
$ sudo netstat -tulpn | grep 3306 tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN 26125/java
That's no good, look at the end of the line - something else is running on port 3306, so probably MySQL couldn't start (though we should've caught that in the previous step).
Something like this might also be problematic:
$ sudo netstat -tulpn | grep 3306 tcp 0 0 127.0.0.1:3306 0.0.0.0:* LISTEN 30970/mysqld
MySQL is up and running, but it's only listening on the loopback interface, 127.0.0.1 / localhost. So if we're connecting from another server, that's not going to work! So there's some config to figure out here to get this working.
6. Is the web server firewall OK?
Let's leave the database server now, we know MySQL is listening on the right port, so it should be OK. There's only firewalls left that might be an issue, and to troubleshoot them I'd start back on the web server.
I'm going to assume for the purposes of this blog post that you use
iptables. This is still the standard for most Linux systems, although some RHEL systems now use
firewalld instead. I'm not getting into that!
To figure this out you need to look at the
iptables output in a bit of detail. What we're looking for, because we're on the web server at this point, is firewall rules in the OUTPUT chain that will allow our web server to send a request out to our database server. This command will load only the OUTPUT chain, as that's all we care about, and filter it using
grep for MySQL:
$ sudo iptables -L OUTPUT | grep mysql ACCEPT tcp - anywhere 192.168.153.16 tcp dpt:mysql
We can see there's an explicit rule in the OUTPUT chain for MySQL on our IP address, 192.168.153.16. That's what we're looking for. And that looks good.
If it wasn't there it doesn't necessarily mean there's an issue. Often the OUTPUT chain is completely empty, in which case outbound traffic is entirely permitted and you don't need an explicit rule. Other times a range of addresses might be permitted to do anything, like this rule which you might see in a LAN:
Chain OUTPUT (policy DROP) target prot opt source destination ACCEPT all - 0.0.0.0/0 172.30.0.0/16
In this case anything outbound to any 172.30.X.X address is permitted, so you wouldn't need an explicit rule for MySQL to go to an address within that range - it's already permitted.
So I'm assuming here you've seen one of the following:
there is no OUTPUT chain (not great, but you're not blocked either)
there's a permitted range and your database server IP is within it
there's an explicit rule in the OUTPUT chain for your database server IP
If any of these is true, we're good, let's move along.
7. Is the database server firewall OK?
Before we leave our web server, we should make a note of the IP address we're connecting from so we can check it has access on the other side. This may vary, depending on your set-up. In my case, I'm connecting over internal IP addresses, so I look at the output of
ifconfig on the web server to figure out what my internal IP actually is - I've blanked out public interface data:
$ sudo ifconfig eth0 Link encap:Ethernet HWaddr f2:3c:91:d5:d0:ed inet addr:X.X.X.X Bcast:X.X.X.X Mask:255.255.255.0 inet6 addr: X::X::X:X:X:X/64 Scope:Link inet6 addr: X::X::X:X:X:X/64 Scope:Global UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:5103798 errors:0 dropped:0 overruns:0 frame:0 TX packets:3391637 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3703290512 (3.4 GiB) TX bytes:951776807 (907.6 MiB)
eth0:0 Link encap:Ethernet HWaddr f2:3c:91:d5:d0:ed inet addr:192.168.151.252 Bcast:192.168.255.255 Mask:255.255.128.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:5209 errors:0 dropped:0 overruns:0 frame:0 TX packets:5209 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2437715 (2.3 MiB) TX bytes:2437715 (2.3 MiB)
So that 192 address is clearly internal - 192.168.151.252. That's what we need to check for on the database server.
Now let's go to the database server. As before, to figure this out you need to look at the
iptables output in detail, starting with any config specifically for port 3306 but this time in the INPUT chain:
$ sudo iptables -L INPUT | grep mysql $
Huh? What's wrong with this picture? There's nothing! If there's no MySQL rule in the firewall's INPUT chain, so that's probably our problem. Let's fix it:
$ sudo iptables -A INPUT -p tcp -s 192.168.151.252 -d 192.168.153.16 --dport 3306 -j ACCEPT
There are plenty of tutorials out there for iptables, but just to illustrate, let's break this down:
-A, appending a rule,
INPUT, to the INPUT chain,
-p tcp, the TCP protocol,
-s, the source IP address (our web server),
-d, the destination IP address (the inbound interface IP of our database server),
--dport, the destination port on the database server (3306 for MySQL) and finally
-j, the action (in this case
ACCEPT because we're allowing this).
Now let's have another look at our
$ sudo iptables -L INPUT | grep mysql ACCEPT tcp - 192.168.151.252 192.168.153.16 tcp dpt:mysql
This looks better! We now have a rule for MySQL, allowing traffic from 192.168.151.252 (on the left) to 192.168.153.16 (on the right) and the port is identified as 'mysql'. Because we're on the database server at this point, we added this rule to the INPUT chain - we're expecting inbound traffic from the web server.
As before, there are other firewall rules you might see which would mean this one isn't necessary. If any of the below is true you will be able to access your database:
there is no INPUT chain (this would be really bad though!)
there's a permitted range and your database server IP is within it
there's an explicit rule in the INPUT chain for your web server IP
And that's it. If you've got this far and you still can't connect, there must be something else in play. Either application issues, or perhaps some upstream filtering at an ISP or at your hosting service provider. In this case a whole new suite of tools come into play, but for an initial sniff I'd recommend looking at "Matt's Trace Route" -
mtr - which is in most Linux repositories.
A note on iptables
Before I go, it's worth noting you might not always be looking for application names in the
iptables output. For consistency, I tend to use the
-n flag with the actual port number, like so:
$ sudo iptables -L INPUT -n | grep 3306 ACCEPT tcp - 192.168.151.252 192.168.153.16 tcp dpt:3306
-n flag tells
iptables to not try to determine the application behind a port or indeed the name of the server behind an IP address. The output is purely IP addresses and port numbers when you use
-n, which can be easier to read once you're used to it.