Can't Ping From/To Windows XP

Posted by Justin Cunningham Fri, 09 May 2008 15:41:00 GMT

I ran into a strange problem recently, I couldn’t ping from or to a Windows XP box on my network. After some unsuccessful experimentation, I thought that the problem may be caused by the NVidia chipset, so in the NVIDIA control panel, I disabled TCP/IP Acceleration, this fixed the problem immediately.

It appears that there is some kind of bug that severely affects ICMP packets in the NVIDIA TCP/IP Hardware Acceleration, so I would recommend disabling it. This isn’t the first I’ve heard of the NVidia network configuration settings causing all sorts of strange issues.

Complexity and Troubleshooting

Posted by Justin Cunningham Wed, 06 Feb 2008 20:13:00 GMT

Over the past day or so, I’ve been working on the absolute worst kind of tech problem, intermittent. Basically, some time ago I deployed another web server, and recently my monitoring service started notifying me that about once an hour it would be inaccessible from the internet for a few minutes. The strange part was, I couldn’t detect any problem accessing the site locally, and most of the time it was accessible from outside the network.

I initially suspected that the culprit was my Mongrel installation, so I got to work troubleshooting that. I spent over an hour working on mongrel before I realized that Mongrel probably wasn’t the problem. Indeed it wasn’t, my first mistake was not attacking the problem at the beginning. This server is deployed behind a Linux bridging firewall, and is physically a Sun Microsystems box running Xen. This particular server is virtualized on that box along with several other systems, and runs an Apache load balancer for the mongrel cluster. In other words, there are many layers where something could go wrong. The correct approach would have been to verify network connectivity internally and externally during an outage period with a tool like ping. After I realized that Mongrel was running just fine, that’s what I did.

After I enabled ICMP packets to the host in the firewalls, I was able to determine that nothing was getting to that host intermittently. I inspected the routing table of the router, and was able to get packets to flow again whenever a disruption started. Obviously I couldn’t manually reset the routing whenever the connection was lost, so I set out to find the real problem.

After examining my xen configuration files, I started to look into the mac addresses that the bridge was seeing. I quickly discovered that my servers mac address wasn’t listed because after periods of inactivity it would effectively time out. Upon closer inspection of the servers network configuration files, I was finally able to determine that while the server had static public IP addresses and a gateway set, it was also pulling a private IP and a gateway from the DHCP server. This interface was acting as the primary interface, which was causing the communications errors. Once I disabled DHCP and configured a static private IP address, I no longer experienced issues.

When dealing with complex environments and intermittent problems like this, it is important to isolate the problem and address it from a bottom up manner. If I had started at the base of the problem and checked the network connectivity to begin with instead of assuming Mongrel was at fault, I would have saved quite a bit of time. Additionally, when dealing with intermittent problems it is important to track down the source and either fix it outright, or force it to fail totally so that the issue can actually be addressed. Once I realized the issue was a routing problem somewhere along the chain, and was able to make it fail predictably, isolating the cause of the issue was much easier, because I no longer had to second guess if my actions were only making the problem worse.

If you follow these two rules when dealing with complex and intermittent issues, you’ll save yourself a great deal of time and trouble and ultimately come up with an exact solution. In other words, handle bugs in complex environments as orderly as possible, otherwise your actions may exacerbate the situation.

Smitfraud / Generic Zlob

Posted by Justin Cunningham Mon, 04 Feb 2008 18:41:00 GMT

In the past couple of weeks I’ve ran into two different Smitfraud malware infections. It is a pain to remove, and has a tendency to show back up. In one instance, it kept manifesting itself with a red desktop background featuring a biohazard symbol and the text “Privacy in Danger”. Traditional spyware removal tools seemed pretty ineffective against this particular problem, but I’ve located the following tools that seem to be of some assistance:

SDFix

SmitFraudFix

SmitRem

ComboFix

When I ran these four tools in succession from Safe Mode on the systems in question, they cleared the malware problems up right away without necessitating a reinstall. I hope Smitfraud isn’t the future of malware, because it isn’t easy to get rid of.