“It’s better to be proactive then reactive”
Performance and Availability are very important aspects of a well running Microsoft Office SharePoint Server (MOSS) 2007 environment. But, how do you know that and when there is an issue? And once you have determined there is an issue, how do you determine exactly where it is?
First, understanding the MOSS farm’s architectural limitations perspective should play a role in troubleshooting methodologies to isolate where the bottleneck is and what measures are necessary to eliminate it.
Troubleshooting
Here are some initial questions to ask:
1. What is the server hardware monitoring solution in place within the organization? i.e. HP Compaq Insight, Dell OpenManage or IBM Director
2. What network monitoring tools are being utilized? i.e. CA UniCenter or HP OpenView
3. What is the software/application monitoring solution used? i.e. Microsoft Operations Manager (MOM) or NetIQ AppManager
4. What needs to be monitor?
As mentioned, knowing there is a problem is the first step in process. Here are some cases where a methodical process could be implemented to determine the issue:
1. A planned outage or load on the system
2. An outage or slowness to the system for a brief period
3. The system has an outage, or displays systems of slowness with no noticeable pattern
4. The system systematically experiences an outage or slowness
However, there might be issues seen in the system that are only apparent to end users within the environment. Users know if there is an issue because the system does not operate as expected and generally will report this to the help desk. The regular monitoring of support calls over a period can be a good indicator an issue is present. Another way issues can be discovered is through system monitoring. This will be discussed in more detail below.
After you have determined there is a problem, the next step is to find out the cause. Using solid troubleshooting techniques and process will prove to be a priceless and save time when problems and issues are discovered. The following are recommended steps that should be followed when issues are discovered:
1. Check the monitoring logs from the hardware monitor for hardware issues
2. Check network monitoring logs for a traffic load or outage related issue
3. Look into the SharePoint server(s) windows logs for problems that may have occurred
4. Look at the database logs for any load or outage that may have resulted in this issue
5. Check the domain controllers to see if authorization was the point that caused the occurrence
6. Review the SharePoint’s logs
Once you have a clear understand where the issue is, you can now proceed to eradicate it. The solution could be as simple as having a piece of hardware in one of the servers replaced to adding additional server(s) to a cluster. Other solutions could be adding network bandwidth or adding another domain controller to manage more authentication request to adding another web-front-end server to the farm.
Monitoring
Monitoring an environment is broken up into three major components: hardware, network and software/application monitoring. As with any monitoring solution, if it is not utilized correctly it is of no use. Once metrics are determined and thresholds are set, make sure an individual is responsible for the actions if one of the thresholds is reached. In addition, notification must be configured to alert a responsible resource when an issue occurs and then trigger a resolution.
Utilize reporting. Make sure you are keeping tracks of trends of the three major system components. Make sure the appropriate information is being reviewed and observed. If there are thresholds set for certain services make sure these specific thresholds are being captures and reported on so that intervention will occur when needed. Doing baseline load testing on the MOSS “farm” and knowing where the load causes performance degradation is the best way to determine the appropriate monitoring thresholds needed for each piece of the MOSS environment.
Suggestion: look to implement solutions that work one with another. An example would be the CIM management pack for MOM consolidator. This will allow for a better more unified monitoring environment, and easier to manage. |
Know the limits of the network infrastructure. Make sure and segment more traffic intensive services and/or logical applications (i.e. SharePoint farm) are put in their own VLAN or network segment.
An example of some things to monitor and their thresholds are as follows:
System Monitor Counter | Threshold |
Memory: % Committed Bytes in Use | Greater than 80 percent |
Memory: Available Mbytes | Less than 50 MB |
Web Service: Connection Attempts/sec | Greater than 500 attempts per second |
Processor: % Processor Time: _Total (CPU Utilization) | Greater than 80 percent |
Current Connections–Warning | 1000 connections |
Current Connections–Error | 2000 connections |
Disk Usage | Less than 10 percent |
System: Processor Queue Length | Greater than 10 threads |
Memory Pages/sec | Greater than 220 pages per second |
Posted
Apr 12 2007, 12:54 PM
by
cooperfdiv