Friday, March 24, 2017

How to Avoid an Epic Outage Like AWS’s

RiskAt the end of February, Amazon cloud service, AWS had a major outage. The outage caused some very high profile sites to go down, including Netflix, Reddit, Quora, Medium and even some government sites.

Overall, AWS service is extraordinarily reliable, and no service has 100% uptime. It’s not surprising that AWS eventually had an outage. What is surprising is the initial cause of the outage, and why recovery took so long.

The outage itself was caused by simple human error. An engineer working on a bug in the billing system took more servers offline than were needed. Unfortunately, like a set of dominos, the additional servers going down took more, and then more.

But that’s not where the story ends. Because so much of the system went down, the systems required full restarts to recover. It was these full restarts that had the systems down for multiple hours.

Amazon has said that they will be adding in additional safeguards to prevent this kind of issue from occurring again, certainly at this magnitude. They are taking the healthy and most appropriate path out of the problem. They are looking closely at the issues and finding ways to correct them so they never end up in the same place again.

And while Amazon should be praised for facing the problems that set them up for the massive outage, they are Amazon, the premier cloud service provider with a global presence. Thanks to their near constant up time, they can take a hit like this and not see significant damage to their bottom line.

In the same situation, though, you may not be so lucky. A time like this is the perfect time to think about your own system maintenance and outage plans.

Less than Disaster

Typically when we in IT talk about outage planning, we talk about disaster recovery. Don’t get me wrong, disaster recovery is a good thing and a worthwhile investment. If you don’t have a disaster recovery plan, you’re rolling the dice.

But we usually equate disaster recovery with exactly that – a disaster. Hurricanes, earthquakes, massive cyberattacks that cost you terabytes of data – these are the kinds of things mentioned in many disaster recovery documents and presentations.

Not to be dramatic, but for smaller companies, and even mid-sized enterprises, being down for a few hours at the wrong time is a disaster for your organization. If you’re a retail organization and you go down for half a day on Cyber Monday, well, that’s a disaster. If you’re a university and your registration system goes down just as registration opens, that’s a disaster.

Any outage during a peak business time can mean significant trouble. It’s why large IT organizations have blackout periods for new software releases during critical business periods. It’s not worth the risk.

And that’s what we’re really talking about here, risk management versus disaster recovery. For instance, think for a minute about driving your car. Risk management is like obeying traffic laws and driving defensively. You’re doing what you can to avoid getting into an accident. Disaster recovery is like car insurance. When the unexpected happens, you’re glad you have it.

Having a disaster recovery plan but not a risk management plan is like driving recklessly, all the time because you have car insurance.

Managing the Risks

Depending on the special needs of your organization, risk management can mean a number of things. That’s because it’s specific to your business risks. That retail organization in the above example will have some risks that are different than the school, and some that are the same.

What’s really important when looking at risk management is acknowledging that it’s a process that involves problem identification, fixing what you can and planning for what you can’t.

Those in an ITIL or COBIT managed organization are probably familiar with the problem identification piece, and likely even participated in fixing some issues. But organizations shouldn’t stop there and hope that they’ll never have to deal with the problems associated with something you can’t fix.

Let’s take a quick look at an, admittedly, forced example.

In our example, your teams are evaluating the potential risks of their systems going down. They identify a system that takes 5 hours to completely reset, based on all of the server dependencies and additional processes needed to restart everything involved with that system. This is the identification phase.

The teams go through and find ways to reduce that reset time by removing out of date dependencies and better aligning parts of the systems that can be reset in tandem. Maybe they need to update old software or perform patches that were slowing things down. They have fixed part of the risk.

But this leaves a 3-hour window where your system may be unavailable in the event of an unexpected system restart. Maybe that’s fine if it’s in the middle of the night. But that never seems to be when critical systems go down.

Some companies stop here and just assume they have done what they can. Instead, take the time to consider any potential workarounds. Is there another system that can take up the slack? Can customers be offloaded to your call center during the outage? These may be quick and easy ways to address the downtime.

Perhaps it’s a more critical system than that. If you’ve got regional redundancy in your systems, through AWS for instance or even your own, private network, can the workload be shifted to the same system in another region? It might be slow, but slow is better than down. Think through your alternatives, including redundancy, to identify issues on systems that are critical for business continuity. 

The last piece of risk management is as important as the first three. When an outage happens, take the time to do a root cause analysis, much like AWS did with their systems. Understand what went wrong and look at what can be fixed or what checks can be put in place to prevent that problem from happening again. And then implement those fixes. It might seem overwhelming at first, but over time it will become part of your regular workflow.

Fixing problems associated with risk might seem like adding additional burden to your already overloaded IT teams. Bringing in a partner that can work through remedies to your biggest issues can relieve some of the stress on your teams, while still providing your organization the protection it needs to keep the business running smoothly. Regardless how you cope with the additional work, risk management is one of the most important steps you can take to ensure your business can effectively operate when the inevitable happens.


The post How to Avoid an Epic Outage Like AWS’s originally appeared on the Curotec Blog

Friday, March 17, 2017

The Two Sides of Wellness and Wearables in the Workplace

Wellness programs are not a new thing for companies. It’s been long understood that a healthier workforce Wearable Technologyleads to a more productive environment with fewer sick days for employees. In fact, companies lose around $164 billion in productivity, annually, to obesity related issues.

Wellness programs go beyond weight management and physical health care. Recent trends in corporate programs include emotional wellness components, which help to drive employee engagement. These programs include mindfulness practices like yoga and meditation.

In the last few years, these programs have expanded to include the use of wearable trackers and gamification elements to motivate employees and keep them moving towards the goals of better health.

Health insurance providers are in full support of using devices and tracking to help create a healthier workforce. In fact, insurers are reducing corporate rates for companies where health trackers are used.

Technology is a significant enabler to the successful use of corporate wellness programs. But these programs can create concerns, both for employees and for IT departments.

The Technology of Wellness Programs

Companies have a few options when it comes to incorporating wellness programs into their operations, from insurer sponsored programs to independent companies that integrate with your organization’s goals.

For these programs to be effective and for some of the more motivating features to be used, there needs to be some level of reporting and tracking. These features need to be convenient, accessible and always available to encourage use.

Technology is the unifying element of these programs, no matter where the wellness program is sourced from. Between websites and mobile applications, these providers make it easy for employees to record their activities and participate in online education programs no matter where they are or when they have time.

Making it even easier is the use of wearable devices. Step trackers and heart rate monitors allow stats to be added to an employee’s profile without requiring the user to think about it. They also provide an unbiased third party report of activity.

All of this combined leads to generally accurate reporting and opens the door for gamification and intrinsic rewards to be used to keep employees on track.

Between wearables, websites and mobile applications, employees have tools available that can help them focus in and achieve their health goals. But the very devices that are enabling the workforce to get healthy can be dangers to the enterprise that is supporting their use.

IT Strains and Risks

The challenges that these applications bring are nothing new to IT. The difference here is that these activities are now endorsed, and even encouraged, by the organization. As such, IT must make accommodations for these risks. The good news is, the problems are ones that IT is already addressing.

One of the concerns is the program websites. While allowing access to outside sites is commonplace for most industries, the sites associated with wellness programs require logins and contain personal information. Training personnel to use unique passwords on outside systems is important in these situations, as is reminding them of the importance of secure passwords.

Because these sites are outside of a company’s sphere of influence, it’s difficult to tell when a security threat, like a virus, is introduced that can affect your corporate network. There is also a greater opportunity for phishing schemes and other social engineering attempts as there is a trusted outside company that could legitimately be looking for information from an employee.

Mobile applications and devices also increase the threat surface. As with any organization that allows BYOD – Bring Your Own Device – concerns around corporate information security and data leakage need to be taken into consideration. Enterprise mobility management and application management solutions can help with these risks, but no single solution is perfect. These solutions don’t address issues like compromised or rooted phones or access to corporate assets if a device is lost or stolen.

As with third party websites, mobile apps can create an access threat through compromised code as well. But given that mobile apps are particularly effective when it comes to wellness programs – digital health apps are identified by consumers as the second most important element in helping support their goals – a wellness program that doesn’t include access to apps may be getting in its own way. Adding threat testing of these outside applications can help to alleviate the worst of the concerns for enterprise IT departments.

And then there’s wearables. Wearables increase the number of access points. So, if you’re allowing the use of these items on your corporate network, you’re inherently increasing the number of places from which a hacker can gain entry into your system. Ensuring that all devices, including mobile and wearables, must adhere to your security policies is important to keep your network safe. Also, consider your network’s topology. Is it possible to allow access through a specific entry point, but still restrict the data available when entering through that point?

In addition to the concerns that your enterprise information security team may have, your employees may also be worried. Because their apps and devices are collecting information about their health habits, some team members may be concerned about who can access that data, and how it will be used. Educating and informing your workforce as to who has access to their information can help to reduce the anxiety employees feel about using tracking devices and employer-sponsored wellness programs.

Wellness programs are proven ways to encourage healthy lifestyles with your workforce. More importantly, your employees are more engaged and more productive when they are healthy. Today, technology plays a huge role in helping employees participate in these programs. And while the challenges to IT can be considerable, they can be managed with good IT security practices that most enterprises already have in place, combined with common sense security training.


The post The Two Sides of Wellness and Wearables in the Workplace originally appeared on the Curotec Blog

Wednesday, March 1, 2017

Amazon’s AWS S3 Storage Service Experiences Massive Outage

CloudStarting at around 1 pm ET today, Amazon’s S3 storage solution began seeing high error rates out of US-EAST-1. Web sites and users across the US experienced outage issues with sites, both large and small. Included in the list were sites like Medium, Slack, Sprout Social, Adobe’s services, Flipboard, Quora, Business Insider, Netflix, Reddit and even the Securities and Exchange Commission.

With almost half of the AWS’s million clients using the storage solution, it’s not surprising that the outage has been felt so significantly. While some only used the service for image storage, other organizations use S3 to host their websites. The service reportedly stores 3 to 4 trillion pieces of data.

Amazon is working diligently to remediate the problem, but with their own service dashboard using S3 to store their status images, it was difficult for a while to understand what services were up or down without diving into specific service updates.

Outages like the one experienced today are rare, but because so many high-profile companies use AWS, it becomes very apparent when problems occur. Such issues are the reality of IT and servers, whether public or private. The expectation that a single service will have a perfect uptime record is unrealistic.

With that in mind, companies with mission-critical applications that require high availability should consider replicating your applications or sites across Regions.

AWS distributes their data centers into Regions, which are physical locations. But in addition to Regions, AWS has created Availability Zones, which are separately housed, discrete data centers located in the same region. These data centers have redundant everything – power, connectivity, and networking – to make them as fault tolerant as possible.

But for those who need additional fault tolerance, AWS offers the ability to replicate your data in different geographical regions. You retain control of the instances regardless of physical location, which allows companies with local compliance and data residency restrictions to manage those aspects themselves.

While an AWS outage is annoying, it’s important to remember that Amazon has one of the best uptime ratings of any of the cloud providers. Downtime is a reality in any server environment, but there are strategies, like multi-region architecture, to ensure a more consistent uptime experience.


The post Amazon’s AWS S3 Storage Service Experiences Massive Outage originally appeared on the Curotec Blog

Shadow IT Doesn’t Have to be Your Enemy

Even its name sounds a little frightening. ShadowIT

“Shadow IT”. It sounds like something lurking in the corner, waiting to pounce. And its other names are worse, with some CIOs calling it “rogue” or “feral”.

The truth is, Shadow IT can be pretty scary for IT leadership. It creates more risks for the organization than just information security issues. It can even cause friction – or greater friction – between IT and other internal teams.

But there can also be benefits to Shadow IT if you’re willing to embrace it, prepare for it, and develop inclusive policies and education regarding it.

What is Shadow IT?

Gartner defines Shadow IT as “IT devices, software, and services outside the ownership or control of IT organizations”. But of course, reality is more nuanced than a simple definition.

Basically, Shadow IT starts out, and thrives, in organizations that either enable departments to do what they want or in companies where IT says “no” more often than they say “yes”.

If you have departments that adopt their own software, that’s Shadow IT.

If you have groups that have licensed their own cloud services, that’s Shadow IT.

If you have teams that have siloed themselves by using solutions that haven’t been vetted by IT, that’s Shadow IT.

This is a situation that has been exacerbated by the Bring Your Own Device (BYOD) trend that many enterprises are seeing and even encouraging. When combined with the overall tech savviness of the average person, Shadow IT seems like an obvious outcome.

While IT departments are trying to keep control around what’s used for both support and security reasons, some are seen as strict gatekeepers that are more likely to deny a request than consider it. Or, equally as bad, an IT department may seem willing to evaluate solutions, but not have the resources to do so quickly or efficiently. Other departments are “helping” when they go off book and find their own solutions.

What many of those that are outside of IT don’t realize is the problems they can cause when they seek out their own technologies. Even IT doesn’t always understand all of the risks associated with allowing Shadow IT to run rampant.

Shadow IT can Introduce Risk

The one risk of Shadow IT that its painfully obvious to anyone in traditional IT is security. Without an awareness of the risks associated with random software acquisition and installation, a department making its own software choices could potential open up the entire network to risk.

Support is another concern. While a team within the organization may have chosen, and are even supporting their own solution, it may not play nicely with other apps approved by IT. It might not even work well on the available equipment provided by IT. As a result, IT gets pulled in to troubleshoot systems sporting software and services they have no knowledge of.

Many organizations must deal with various levels of compliance. Whether that is tracking required by IT as part of Sarbanes-Oxley, or stricter requirements like PCI or HIPPA, organizations that allow or encourage non-IT teams to adopt their own IT equipment, software and platforms can put the entire organization at risk of being out of compliance.

Many groups will argue that they are using their own budgets for their Shadow IT initiatives, so it shouldn’t be a concern of IT. But costs are a larger concern than just what fits into an individual department’s budget. For instance, if multiple internal organizations have contracted individually with the same 3rd party, the enterprise may be missing out on savings associated with volume licensing.

Lastly, Shadow IT can create integration nightmares for IT. If two internal teams need their software to talk to one another, but they are using disparate solutions, they may turn to IT to connect their data silos. Without having vetted the vendors, one, or both, solutions may be built on platforms unfamiliar to your IT organization. Or one could have no external interfaces available at all. A problem that could have been cut off during the evaluation process has now become a headache for the central IT organization.

How to Incorporate Shadow IT

As risky as Shadow IT can be, it’s unlikely that you’ll be completely unable to remove it from your enterprise, especially if it’s already got a foothold within the organization. But it might not even be in your best interest to remove all facets of Shadow IT.

Instead, working with the various teams within your organization can allow them some control over their solutions, while relieving IT from dealing with multiple demands with dwindling resources.

First and foremost, you should make sure that anyone considering investigating their own solution gets an understanding and some training on the security risks they need to be aware of during the evaluation process. And if you’re under compliance requirements, you want these organizations to understand what is required to meet the compliance rules.

Next, your policies around individual departments adopting their own software and services should include requirements that IT be aware before a choice is made, during the requirements gathering and definition phase. The intent here is not to tell other teams “no”, but to make them aware of other teams that are using similar software, or teams that may have a similar need.

If these multiple teams can agree on a single solution, they can split the costs across their budgets, and potentially gain the benefits of volume licensing. It also gives the central IT organization the opportunity to guide these departments to solutions that are known to operate well within the existing technology ecosystem.
Shadow IT doesn’t need to be a thorn in the side of your traditional IT department. It’s possible for individual organizations to work with the central IT organization to get what they need, while still meeting the requirements and mitigate the risks to the larger organization. Training, planning, and becoming part of the Shadow IT process gives you insight into the needs of these teams without becoming the department of “no”.


The post Shadow IT Doesn’t Have to be Your Enemy originally appeared on the Curotec Blog

Curotec Favorite Links