Developer Case Study: Weaving Visibility into Carhartt’s eCommerce Systems
For Carhartt, a dedicated full-stack observability team has positive impacts inside the IT organization, across the business entity, and for customers worldwide.
Since 1889, Carhartt has been producing workwear known for rugged construction, innovative design, and exceptional standards of quality, durability, and comfort. Founder Hamilton Carhartt, whose mission was to provide railroad workers with heavy-duty bibs, worked hard to establish trust with his market. This legacy has earned Carhartt respect among everyone from construction workers to outdoors enthusiasts to farmers to miners — and even celebrities.
Transparency and commitment to quality have been the keys to fostering this customer relationship for more than 130 years. During the past 20 years, like many retailers, Carhartt’s production and sales models have gone through a digital transformation. Just as Hamilton Carhartt worked to establish trust with customers across the American frontier, the Carhartt application performance team has worked to establish trust along the digital frontier, ensuring visibility of critical applications that run everything from manufacturing to their online storefront, to help minimize downtime.
Bryan Laszlo is the Application Performance Manager upholding Carhartt’s reputation. Thanks to Laszlo’s team, Carhartt can see that its digital processes to manufacture, distribute, and sell their garments run smoothly.
With the advent of multi-cloud environments and microservices, Laszlo’s team, like many IT organizations, faced an increasingly complex landscape. Laszlo’s story is an example of how essential it is for IT teams to evolve to match this changing landscape.
Carhartt identifies a need for visibility
Carhartt’s full-stack observability story starts in 2018. Laszlo, at the time an infrastructure engineering manager, was already familiar with the impact of network downtime. Coupled with application downtime — specifically merchandising applications like SAP which are used for manufacturing, operations, distribution, and point-of-sale — the business impact was obvious. In retail, time is quite literally money if the product can’t be produced or purchased.
The state of Carhartt IT before full-stack observability
Before implementing full-stack observability, when an issue occurred, if the right engineer got an alert, they would start digging into the systems, hoping to find and fix the issues quickly. Other times, a customer or client would call the help desk and say, ‘Something isn’t working. Is something down?’ This customer touchpoint is not something any retailer wants, especially an organization that relies heavily on ecommerce.
“We were very reactive. We were reacting to a monitor or customer telling us something was up,” says Laszlo. “We needed a tool to help us get ahead of issues on our site and be more proactive.”
This immediate customer impact is just the surface of the effects downtime had for Carhartt’s IT team. Determining the root cause of the issue often was like finding a needle in a haystack.
“Coming from the network side, when an incident occurred, I knew that sometimes the incident wasn’t the network’s fault. However, we always had to prove that the network was still up before we could address the actual problem. I would say, 8 out of 10 times it wasn’t the network. An issue with the network is usually obvious, but if the issue is slow and intermittent, it can be time consuming to troubleshoot what is going on,” says Laszlo.
The team used what they had at the time to determine the root cause, including third party applications’ status centers. But, as Laszlo soon realized, these status centers lacked the business context that comes with integrating across the network and all applications.
The first step in inventing the unicorn was understanding what was needed — a tool modular enough to customize for Carhartt’s unique application stack, flexible enough for individuals to drill down into specific issues, yet accessible for those without specific developer or DevOps training.
Carhartt pilots full-stack observability on e-commerce
Looking at the organization’s needs, and Carhartt’s unique setup, the IT team landed upon Cisco AppDynamics as the tool for the job.
The Cisco AppDynamics pilot for ecommerce was an immediate success. From deployment in 2018 onward — including throughout challenges brought on by the COVID-19 pandemic — Carhartt was able to streamline efforts to keep their ecommerce applications running.
“Once we had visibility into ecommerce, we could very quickly identify which teams to pull in. For example, if the issue was in the code, we knew it made sense to get the direct-to-consumer team involved rather than the network team,” says Laszlo.
Initial success ignites plans for expanded use of AppDynamics
Soon, the success of Cisco AppDynamics caught fire, and Laszlo expanded use of the tool to include SAP.
“[Cisco] AppDynamics was originally only focused on our ecommerce journey. We had visibility into that, but that’s all we had. If anything was going on with SAP or something else, we were stuck hunting and pecking and engaging 15 different people to figure out what was going on,” says Laszlo. “We really started to see the benefit [within ecommerce], and that’s where we decided to expand [Cisco] AppDynamics into other critical applications like SAP.”
Soon, Carhartt deployed Cisco AppDynamics for their instance of SAP in tandem with a greater cloud migration. Because of this intentionally-timed deployment, Cisco AppDynamics was able to both guide the team as it determined the size of the environment needed as well as proactively notify on issues related to load.
“SAP has always been the beating heart of Carhartt from a business sense,” says Laszlo. “From the beginning of re-launching SAP, we ran [Cisco] AppDynamics across the board, in production and non-production environments so we could see everything that was going on. There were so many scenarios that [Cisco] AppDynamics helped us figure out what was going on when something wasn’t working right. It was very, very powerful, and it assured success from a technical level, making sure we were able to build the platform in the way we needed to, and that it was running the way we expected it to.”
Historically, SAP would become heavily loaded through Carhartt’s busiest season, from Black Friday through the holidays. By moving to the cloud and running AppDynamics with SAP, Carhartt has been able to right-size the infrastructure. “We can load-test and see what is going on and get ahead of any issues that we see being in the cloud,” says Laszlo. “The full-stack observability tools help us do that.”
How Carhartt spun up an entire Application Performance Team
The immediate success of Cisco AppDynamics only strengthened Laszlo’s theory about the benefits of implementing a complete full-stack observability model. While maintaining ecommerce application performance via Cisco AppDynamics, Laszlo continued researching the opportunities to increase observability, and brought an idea to his leadership team. Was there enough of a benefit here for a dedicated application performance monitoring role?
“The problem we’ve always had is we couldn’t carve out a role to focus on application performance and getting the value out of [Cisco] AppDynamics that we really wanted to get. I drew a picture that showed all the places we didn’t have observability in our critical transactions, and showed it to my boss, who showed it to his leadership team. And, as we started to work through things, and with my experience in networking, and that end-to-end view in thinking that I’ve always had, [leadership] saw the opportunity to move me to that role,” says Laszlo.
Leadership didn’t stop here — the need was recognized as significant enough to spin up an entire Application Performance team, with Laszlo at the helm. The team will consist of the technical SAP team that Laszlo has historically led, as well as two database administrators, and two analysts: one more business-focused and one more technical.
“We’re not going to be a DevOps shop because we’re not heavy on development, but there are some principles there and some ways that people work together that would be very valuable. The team will take Carhartt from just [Cisco] AppDynamics to true full-stack observability. With a dedicated team, we can fill the gaps identified in the picture and finally get true observability of all our critical business transactions,” says Laszlo.
An example diagram of the visibility gaps that Carhartt had to fill.
The team adds tools to fill in gaps
One of the most critical gaps that Laszlo identified in his research was within retail stores.
“We’d have an issue and wouldn’t know if it was occurring from the point-of-sale systems, or if there was something going on in the network or the Wi-Fi. We just didn’t have good visibility, and it was guesswork to identify the root cause,” says Laszlo.
The big piece missing? Network visibility. Laszlo leaned on his networking background to choose the right tool to bring networking visibility in line with his improved application visibility.
Enter Cisco ThousandEyes to start completing the full-stack observability picture.
“We’re still building [Cisco] ThousandEyes now, but we’re starting to be able to show the reachability with [Cisco] ThousandEyes and do the tests that we need both inside and outside of our walls, helping us isolate trouble faster – carving away at our most critical visibility gaps,” says Laszlo.
Together, the tools work in unison, moving the needle towards a more complete full-stack observability picture. “Moving towards full-stack observability, we are now able to identify if the network isn’t answering or if there is a bug in the code, and get the issue fixed.”
The impact of full-stack observability across greater Carhartt's IT team
As Carhartt has moved toward full-stack observability, the organization has started to think like a DevOps shop. With this mindset comes an improved ability to communicate with internal and external stakeholders, ultimately freeing developers to attend to an incident. Long phone calls with engineers and developers explaining the meaning behind downed networks or bugged code were replaced with the ability to show visuals on Cisco AppDynamics and Cisco ThousandEyes, providing increased understanding around impact and severity of incidents.
“We’re now able to have conversations about things that people understand. People may not understand networking or code, but when you can show a picture of what’s going on and start talking that way, that changes what’s going on and makes us more efficient,” says Laszlo.
The full-stack observability effect on collaboration
Before moving towards full-stack observability, Carhartt’s strong culture of collaboration enabled the team to work together to identify issues. Without visibility, this culture was a necessity. With visibility? The culture is a team strength rather than vital organ, and observability tools have allowed the team to free up resources to focus on other projects.
“Carhartt’s culture of collaboration has always been to pull together and find the issue. But why bring in folks you don’t need if you can isolate the issue and troubleshoot faster?”
Coupled with the move to the cloud, full-stack observability has enabled Carhartt to become proactive and start addressing technical debt. “Full-stack observability has given the team time to go learn things. Team members have been able to take a week off to go take a class and learn a new skill, which is vital right now because we have reskilling needs and gaps that we need to fill.”
Moreover, the team can dedicate more resources to projects instead of overworking team members who have to switch between operations and projects. “And, I am seeing people have more time, less stress, and it has made a noticeable difference.”
A future of learning, AI, and evolving tools
Laszlo’s team is committed to learning more about their toolset, and to getting more out of it.
“One of my goals is to understand how we can set up a learning program, set up learning days, and evangelize this product, and get even more people using it.”
“We’re really trying to enable other teams to use these tools that we’re the caretakers of — they’re the ones that will benefit the most from these tools.”
From here, Laszlo is excited about the future opportunities the evolved full-stack observability products and platform, along with the potential application of AI, will bring to the team.
When thinking about the most critical features of a full-stack observability tool, Laszlo adds, “Moving to OpenTelemetry -- a smart move by Cisco to build on that --so we can integrate with our systems, is important… ease of use is also number one. It has to do the job. From there, how easily can you get the information you need? How easily can you get to a view you need to see? A co-pilot feature with artificial intelligence can help a non-developer understand what is going on between, say, midnight and 2 a.m. on Friday nights. It abstracts away the background complexity that has to happen to make these products work and makes the end-user view very simple. In the end, it’s about getting the right info to the right people at the right time.”
Thanks to Laszlo’s vision, the right tools, and leadership support, Carhartt provides the same reliability in their applications and networks that they’ve been known for in their clothing, since 1889.
Resources
Developer Site: Full-stack observability
Learning Lab: Full-stack observability
Infographic: Unraveling Endpoint Complexity
Developer Site: AppDynamics
Documentation: ThousandEyes
Learning Lab: ThousandEyes
Blog: Carhartt drives exponential growth with new revenue streams – backed by Cisco application solutions
Overall Business Case Study: Carhartt