Welcome to my first Tech Lounge blog post. My name is Drew Bisson and I am the Support Critical Situation Manager. I see many issues and outages, a good portion of which are due to planned changes where people skimped on the planning. So without further ado I give you…
Wax On, Wax Off: Best Practices for Planned Changes
In the 1984 film, The Karate Kid, the main character Daniel LaRusso (played by Ralph Macchio) wants to learn Karate and his teacher, Mr. Miyagi, starts his training by having him perform arduous chores like waxing his cars, sanding a wooden floor, and painting his fence. Daniel gets fed up with chores and feels that Mr. Miyagi is never going to actually teach him Karate. It is at this point that Mr. Miyagi reveals that Daniel has already learned several defensive blocks through muscle memory from doing the chores. Ah ha! “What are the laborious chores that will prepare me to become a change control black belt?” I am glad you asked.
First, let me provide some context. When I say “planned changes” I mean intentional, production environment changes, such as applying an SU or ES. If you perform the following tasks, you will be able to protect your production environment from an entire dojo of Cobra Kai goons, “Known Issues”, unknown issues or even just simple mistakes.
I. Recognize that you are about to make a significant change, which requires preparation.
Schedule the change and notify your Support Regional Management team if the work is going to be performed after hours and could result in an after hour emergency call.
II. Read the documentation.
I know we are all used to a world of readme.txt that never gets read, but when it comes to an ES you MUST read the readme. Sometime there are pre or post install steps that must be completed. Manually running a DB script would be one common example creating a server parameter would be another. When it comes to SUs, there are several documents you need to review.
In fact, we list them on the web site:
- The special Notices section is both directly below the text from the screenshot above and also in the SU ReadMe.
- The ReadMe contains the instructions for installing the SU including Pre and Post install instructions and they are grouped by SU release, so be sure to read all of them for each SU between the version you are on now and the one you are applying. The ReadMe also contains the uninstall instructions as well.
- New Features is self-explanatory.
- Summary has a list of all the SCRs that have been fixes in the current and previous SUs.
- The Known Issues section is one of the most important pages to review, because it lists all the SCRs that were significant enough to require ESs. It is worth reviewing that page periodically and not just when you are planning an upgrade. There is an RSS feed for the Known Issues and you can subscribe to a mailing list on that page to be emailed all the Known Issues as they are posted.
- Lastly, if you have a switchover pair you should review the “Procedures for Application of ANY SU on 3.0 Switchover Servers”.
III. Utilize your dev environment.
You should run through the planned change in a development environment. This is a good way to catch any gotchas and avoid surprises. Also you should then run through the uninstall/rollback procedure as well. Some people (who have not read the readme) may be surprised to find that if you install a Dialer SU and need to uninstall it, you must perform a repair install of the IC Server.
VI. Just in case…
Though you should never need it, it is a good idea to take images of your production IC servers prior to performing the change… just in case.
V. So now you are ready, right? Wrong!
Decide on a timeline and communication plan. If something goes wrong, who needs to be contacted? Who will approve execution of the pre-defined roll back plan and after how long? Decisions about how long to troubleshoot an issue before executing the roll back plan are best made before you are in an emergency situation. Also, make sure the person performing the work is mindful of the need to gather logs, a thorough description, and callids in the event that a problem is encountered so a root cause analysis can be performed. Also, it is a good idea to copy the full day’s log folder off to another location. That way if Support’s investigation leads them to ask for a log that was not initially requested, you will still have the logs and they will not have been overwritten.
VI. Measure twice, cut once
Sometimes, there can be weeks between when you started your SU planning and when you actually perform the work. If you received an ES for any known issues, be sure to go back and review the Known Issues page or contact support to see if those ESs have been superseded or if new issues have been found.
VII. Change control
Lastly, when the changes are finally made in production, ensure the engineer updates your change control documentation, which should contain relevant information like when the change was made, by whom, what was changed and why.
Simple, right? Of course, just like waxing a car or sanding a section of wood. If you do the chores you will be surprised to find fewer surprises when you make production changes. Avoiding trouble is more than half the battle, but if…. or perhaps I should say when…. a roundhouse, leg sweep, or even crane kick comes at your IC server in the middle of an upgrade at 2 AM, you will be prepared to execute your contingency plans as if by simple muscle memory. Banzai!
[Note: I intentionally left out a very important step in my list. Leave a comment pointing out what you think is missing.]
I hope you enjoyed my first blog post. Be sure to look for my next topic which will not contain any references to Ralph Macchio, but might include references to Dr. Gregory House (one of my favorites).