Features, Insight, Technology, Vendor

How to manage IT infrastructure in a fast-growing company: the DataRobot experience

The material provided by ASBIS Enterprises Plc. Please contact us at info@appleenterprise.asbis.com if you have any questions about the material or Apple Enterprise Management Solution and how it works in more detail.

Hi! My name is Oleg Sokhan, I am an IT Operations Engineer at DataRobot.

We are a technology company based in Boston, with offices in many cities around the world, including Washington, DC, Columbus, Copenhagen, Kiev, San Francisco, Singapore and Tokyo.

Our IT team is dedicated to supporting the operations of the company 24/7/365. We run day-to-day operations and are responsible for ensuring that every employee has access to the company’s necessary IT resources, as well as receiving everything from the team to be productive and successful. We also manage hardware environments, handle the installation of critical software (software) updates, and proactively work on various IT projects to improve the efficiency of the team as a whole. We have 1500+ computers in our area of responsibility.

In this article I want to tell you how we manage our IT-infrastructure through the platform Jamf MDM (Mobile Device Management), as well as why we chose this particular tool and what problems we were able to solve with its help.

Why choose macOS and Jamf

When choosing the infrastructure to work with, we settled on Apple as the primary platform for several reasons:

  • Most of our developers and DevOps engineers come from the Unix world. And macOS is an operating system from that family (Darwin is an open-source Unix-like operating system).
  • MacOS has a pretty good mix of handy development tools, as well as a suite of applications for everyday business tasks. These include Microsoft Office, Safari/Chrome/Firefox, Slack, Cisco Webex, Google Meet, or Zoom for employee communication.
  • Apple computers are a single entity: the platform and software are developed by a single manufacturer. This avoids many compatibility problems.
  • Each new macOS release has many innovations and trending technologies, both in development and cybersecurity.
  • Apple computers are more durable in terms of technological aging – it’s a sound business investment.
  • We are not exposed to all the dangers of the Windows world, whether it be crypto-crypto-cryptors or tons of malware*.

* Without getting into a chorus: the answer is yes, they happen, but it’s rare.

To date, security researchers have identified several examples of Mac ransomware, but none of them have resulted in serious outbreaks.

Last year the company approached the threshold of 1,000 employees. And we realized that we had grown out of “short pants. After that, it became too difficult and inefficient to manually manage the IT infrastructure, which is “scattered” across five continents. In addition, we needed to ensure the security of the devices used by employees. On the other hand, the rapid increase in the number of employees created the challenge of being able to scale the business. Then we thought about issues of internal optimization of IT resources, one of which was the implementation of MDM, a service for mobile device management.

We spent months looking for the best solution for our infrastructure. However, since the majority of the company’s computers are Apple computers, we chose the market leader, the Jamf MDM platform. More than 40,000 companies in the world use it.

We have created a checklist of tasks that we want to solve with this product:

  • perform initial preparation and setup of the computer for IT onboarding tasks.
  • be able to apply corporate policies to an endpoint, including an employee’s computer.
  • solve the problem of managing system updates as well as updates of installed applications in a short time by deploying critical updates.
  • bring computers into compliance with basic security standards and regulatory requirements, adhere to best practices in this area, and manage patches and combat vulnerabilities.
  • Automate software deployment to increase productivity.
  • collect massive amounts of informative data and use it to make the right decisions to optimize the company’s business processes.
  • deploy the IT Self-Service application to the endpoints – I will tell you about it below.

The process of implementing the new system took us almost a year. All this required a huge amount of work, but we are satisfied with the results achieved. With the help of Jamf we solved a huge number of tasks on managing the company’s IT infrastructure. It’s impossible to talk about all of them in this article, so I’ll focus on a few key ones.

Here are some of the results we achieved with Jamf.

Automated inventory

Ten years ago, almost all work computers were desktop computers and were inside the company perimeter. But trends have changed over time. Now about 95% of our computers are mobile devices, in our case, laptops. Employees work with them both inside and outside the office: they take them on business trips, they connect remotely from home, which has become especially relevant during the pandemic.

Therefore, several questions arise:

  • how to solve the classic problem of inventory of IT resources of the company, i.e., to obtain data in the context of “computer-owner”.
  • how to find out what software, what versions and configurations are installed on each computer.
  • how to deliver and install critical updates to your endpoint.
  • how to be sure that the updates are successfully installed.
  • how to manage the process of remote installation of updates in general.

Software statistics are needed, for example, to counteract security threats to corporate computers, simply to make safe use of installed software and, most importantly, to prevent leaks of corporate information.

For example: the developers of applications such as Zoom, or Google Chrome announce that all but the most recent version contains a vulnerability. This means we need to get a report as soon as possible about which computers have the vulnerability and force the process of updating a particular application in all offices of the company, as well as on the computers of employees who work remotely in different time zones. At the same time, it is important not to interrupt user activity or, alternatively, to interrupt short-term with appropriate notification that this is a DataRobot IT activity, and it is aimed at fixing a specific problem.

As a result, it is important to be able to evaluate a particular case, including various criteria:

  • how much time is spent on a global vulnerability update.
  • how many computers are vulnerable to software vulnerabilities.
  • how effectively the IT department works, and so on.

Jamf solves such problems. This tool allows you to automatically conduct an inventory of all devices connected to the system, recording all changes: logging applied policies, updates, and more.

Thanks to Jamf we literally collect certain metrics in one click. The metrics can be, for example, data on computer configuration, owner, software versions, installed programs, digital certificates, disk encryption, password information (whether a password is complex enough, when it was last changed), and so on.

In addition, an extended inventory attribute can even be the result of some script execution. For example, we have script execution scripts that check if a particular process is running on the system – the result of the script execution is returned to Jamf as a metric.

Let’s go back to the version example. If a software vendor announces a vulnerability, we create a search in Jamf using the criterion “version lower than N”. As a result, we get a report on how many computers have the discredited version, on which devices, who owns the device and in which office it is located. After that, there is no need to write to all employees (send them an email) with a request to update.

The search results are converted into the scope of the policy. Then, with the blessing of the information security department, we start the process of enforcing software updates on those computers where they are required. And what’s more, there are tools to get statistics on policy enforcement in terms of “executed”, “pending”, or “failed”.

Another example. From time-to-time Apple announces service programs for the replacement and extended repair of certain components. For example, a service program for keyboards, a program for replacing defective batteries and solid-state drives. All service programs have expiration dates and apply to specific models and serial numbers.

Such a case is almost impossible to solve without using Jamf. Whereas using the MDM system it’s a matter of a few clicks to find and generate a report with the model or serial numbers of computers, and most importantly, information about who owns and where the laptop is located. In this case, in the generated report we regulate which metrics must be shown, so as not to overload it with unnecessary details.

Here’s another example from the last one that we implemented in terms of effective use of Jamf for inventory tasks. We recently set out to find a solution to the problem of how to collect data about the battery status of employees’ laptops. The problem statement is as follows. Jamf’s policy is to collect <health battery condition> data from endpoints and form a smart group based on <service recommended>. And what’s more, every first Monday of the month the generated report in Jamf is sent to the company’s HelpDesk system as an email with an attachment. Each incoming email is automatically converted into an open ticket.

We have been receiving similar reports for several months now. We found an unexpected effect: now it is much easier for the finance department to understand the situation with the nearest service costs and plan service budgets on a monthly/quarterly basis.

Deployed IT Self-Service as employee-IT communication

With Jamf, we offered a new form of employee communication with IT through the IT Self-Service application. In fact, it is a portal for company employees to change the status quo in established business processes within the company.

Our position: IT Self-Service is an employee’s first IT companion and the first line of IT help. The main idea of this service is to create conditions to reduce the load on the IT-team and reduce the number of open tickets to HelpDesk. This means more efficient use of the company’s IT resources.

Let’s look at a few scenarios for its use.

Global access to printers. This is the first task that we implemented as an IT Self-Service concept. Let’s say an employee from one office comes on a business trip to another office in the company and needs to print some document. Previously, he had to first ask a colleague where the nearest printer was located in the office, and then look up information on the IP address and other details of the device in order to send the file to print. To do this I had to write a request to the Slack channel #info-tech; open a ticket in the IT HelpDesk or look up the information on the company’s internal wiki. All options took time and did not add convenience to the work.

With IT Self-Service, the task has been simplified: just go to the “Printers” section of the portal and install the necessary printer with all the drivers and settings for the specific device on your computer in one click. And each printer is signed in “Office: Floor” format. For example, “Boston Office: 12th Floor Ricoh C307 Printer.

Software. Programs are always at your fingertips. As a result of IT onboarding scripts, each employee gets a basic set of applications, but there are also a number of optional products that are in IT Self-Service. In addition, we get the opportunity to install a new software product that is planned for implementation within the company. A volunteer who wants to participate in the process of beta testing a product simply goes into IT Self-Service and installs the application with one click.

Lastly, the IT Self-Service portal has a user authorisation option, which allows you to target a particular software product to a particular focus group.

Simplified work processes

Employees, like everywhere else, may have a technical failure or a technical issue at any time – for example, some software stops working. In this case, a developer or other technical specialist is likely to be able to describe the problem in detail and give us the data we need to analyse and get the service up and running again. But business users cannot understand it deeply enough.

If we all worked in one office, the IT HelpDesk specialists could come up and see what was wrong in person. But with a team spread across five continents, such a trick wouldn’t work.

So, we looked at IT Self-Service as a company IT resource that makes the lives of employees better, and we implemented it:

  • IT Service catalogue.
  • VPN Troubleshoot DNS resolution issues.
  • Say Hello to IT Squad (Sysdiagnose).

IT Service catalogue. This is the IT Self-Service app firmware, which allows the user to open the service catalogue within the company’s HelpDesk in one click. There, employees can place an order, for example for access to a cloud service; order a required software license, an additional monitor or accessory, and much more. And you don’t even have to open a browser and look up the resource’s web address to place an order. One click on the application icon and you’re already inside the service catalogue.

VPN Troubleshoot DNS resolution issues. We’ve developed a separate application to help users deal with VPN failures. The application is inside the IT Self-Service and allows you to get in one click all the required metrics to analyse the network connection at the endpoint, for example: nameservers; nslookup; traceroute; ping test and so on.

The report file is generated on the user’s desktop as a result of the application. From there, it just needs to be sent to IT. This makes it possible to understand exactly what the problem is and quickly find a solution.

Say Hello to IT Squad (Sysdiagnose). Sysdiagnose is a macOS incident response tool developed by Apple: collecting data about the device, files, and system. This tool generates files that allow the IT department to investigate problems with an employee’s remote computer and improve the company’s IT infrastructure.

As soon as the Sysdiagnose IT Self-Service application is launched, a file is created on the employee’s desktop as a tar-archive in the following format: <sysdiagnose_year.month.day_time._Mac-OS-X_MacBookPro … tar>. It contains all the diagnostic information, and as a result it allows you to significantly optimise the response time to the user’s request, as well as quickly and correctly solve the problem.

Another case that we were able to solve was bringing the meeting rooms of all the offices to the same standard. Our offices have a total of about 200 meeting rooms, and each of them has a big screen for meetings between employees from other locations. The media centre of this system is a Mac minicomputer, which is essentially implemented as a self-service kiosk.

In our case, a rally-room is a Google calendar resource. To use the rally room, an employee or group of employees books a specific resource through their Google calendar. Typically, meetings are held there one after the other, sometimes without interruption – especially when regional and Boston office hours overlap.

The first thing a person in a meeting room does is launch Chrome, open the Google calendar, select their meeting schedule, and click the link to start the video conference. But if the previous user of that meeting room has accidentally closed the browser and removed it from the Dock bar or removed the Sound Volume or Bluetooth icon from the menu, the next user may have trouble finding and opening the application. In addition, if the Bluetooth icon is not displayed in the menu, it will not be clear to the person whether the Bluetooth keyboard needs to be charged – they cannot quickly enter the settings to see what is wrong.

That’s why we undertook the task of providing fail-safe operation of the rally-rooms. Each of them should have a strictly fixed set of unified elements, working on the principle of a self-service kiosk, where nothing can be “broken.

And once again Jamf came to our aid, where we implemented most of the requirements for rally rooms through policies and configuration profiles:

  • Blocked icons for key services like Zoom, Chrome, and Cisco Webex in the Dock panel. The user can add any other required items, but not remove the basic ones.
  • Unified the desktop wallpaper in the meeting room (branding DataRobot) without the ability to change the user.
  • The Mac mini has blocked the ability to go to sleep and use screensavers.
  • Jamf policy in the form of a script checks every 15 minutes if the Sound Volume and Bluetooth icons are on the menu and puts them back if a person accidentally deletes them.

The plan is to implement a project to collect data on keyboard and trackpad battery levels from the Mac minicomputers used in rally rooms throughout the company’s offices. The Jamf policy will collect data on battery levels and generate a report with a list of conversations in which the peripheral Bluetooth devices are below critical charge levels – and a ticket will automatically open in HelpDesk.

Fundamentally changed IT onboarding

Last year, the company’s staff almost doubled. For each of the new employees, it was necessary to prepare a working laptop with the necessary software for business users. And if it is an engineer, his computer must also have a development environment with an implemented prototype of the DataRobot application.

There were weeks when up to 10 new employees went through the onboarding process in the Kiev office. Our record is 100+ trained computers by IT people in the Boston office when Datarobot acquired Paxata.

Since classical DevOps engineers were at the origin of the company’s IT onboarding process automation, the scenario of computer preparation for onboarding was implemented with the world’s most popular DevOps configuration management system, Ansible. It’s written in Python using the declarative markup language YAML. The approach was respectable because it solved the problem of preparing computers for both macOS/Ubuntu platforms with a platform-dependent branching of the deployment script. It soon became clear that maOS lacked a classic package manager (again, the Linux world), and DevOps engineers started using the Homebrew package manager, which is distributed as free and open-source software.

It seemed that there was no need for a graphical interface for computer preparation, everything was moved to the command line and all automation issues were closed with Ansible. But using this approach also revealed a number of problems. The number of hours spent on supporting this automation began to skyrocket.

As a result, instead of concentrating their efforts on supporting the DataRobot application development environment, DevOps engineers were forced to spend their time supporting the Ansible playbook, part of which involves installing a certain standard list of applications: Chrome, Firefox, Microsoft Office, Zoom, Tunnelblick, Sublime Text and others. Consequently, each new macOS release was not one of joy and enthusiasm. Behind it were long hours of work to adapt existing scripts to the new version, while Canonical also periodically spoiled its users with new Ubuntu releases.

We are gradually rethinking the onboarding process and moving entire phases into Jamf.

We are now in the process of splitting the preparations for onboarding macOS laptops into two phases:

  • Installation of software that all employees (business and engineers) use: Chrome, Microsoft Office, Slack, Tunnelblick, Zoom, Sublime Text, and so on.
  • installing additional software to create and populate all the dependencies of the development environment, which only devs or DevOps use.

The first phase is the responsibility of the IT department, the second phase is the DevOps team. The first one is for everyone, the second one is for engineers. The first is Jamf, the second is Ansible. Although the plan is to support and duplicate new, purely engineering dependencies on endpoints already used by engineers. It will be a joint project between IT and DevOps, where “data” is the DevOps area, deployment is the responsibility of the IT department, and Jamf is the tool for delivering and deploying the “container”.

One of the peculiarities of using Jamf for the computer preparation process is the ability to apply policies. Their policy scope is smart groups created according to certain criteria. This, in turn, allows you to use a specific set of Jamf event triggers to trigger a policy, a script or installation of an installation package, and possibly an entire script that consists of many components.

For example, this scenario automatically installs the Cortex XDR agent (Palo Alto product) as the next step after adding a computer to Jamf (computer enrolment), because the computer automatically gets into the smart group of computers that do not have this software installed.

In our movement to renovate the IT onboarding process, much has already been done, more is yet to come.

In the near future, we plan to integrate with OKTA (single sign-on authentication of the employee when he gets initial access to the laptop) service, which will bring us closer to the Zero-Touch Provisioning solution as the most efficient way to remotely deploy computers and mobile devices. This means that the user will receive their laptop directly from Apple’s warehouses, meaning that the device will not come to our IT department at all. In other words, it’s the ability to send unprepared devices to remote employees, which will greatly speed up the onboarding process.

As a conclusion

Jamf is probably the best IT infrastructure management solution that we have implemented recently. Thanks to this tool, we have improved the automation of internal IT processes, as well as brought the company closer to meeting CIS Benchmarks security standards.

Obviously, the reality of implementation has revealed some limitations and shortcomings of the existing solution. At certain points we fully agreed, as did many other Jamf Nation users*, that Jamf developers need to revise certain parts of the user navigation and user interface elements (Graphical UI design).

*Jamf Nation is the world’s largest community of Apple IT managers, where you can network with other IT professionals, learn new things about deploying Apple devices and share cutting-edge or just ideas with each other.

For example, the basic entities in Jamf are configuration profiles and policies, each with its own scope. The scope of a policy can be either individual computers, or static or smart groups with the possibility of complex logic to select the right set of objects for applying the policy (<targets> / <limitations> / <exclusions>). However, there is no explicit possibility to visualize the set of policies and configuration profiles in the context of a group of computers.

There are exactly two ways of solving this problem. The first one is to write a complex script, for example in Bash or Python which would include processing a number of requests via Jamf API (with subsequent visualization in the form of HTML file, let’s say). The second is to join a petition on the Jamf Nation forum to add this functionality to future versions of Jamf.

Another bottleneck is Jamf Patch Management. With all the rich functionality, IT administrators still have to manually create (“repackage”) update packages, for example using Jamf Composer, with subsequent uploading to Jamf Software Server (JSS). However, anyone who decides to automate this process will look to the third-party solution AutoPkg, an automation environment for packaging and distributing macOS software that focuses on tasks that would normally be performed manually to prepare software for mass deployment to managed clients.

Finally, the last item on the wish list: there is no built-in integration with a version control system such as GitHub. So, in this case, you’ll have to look at a third-party solution, git2jss. This is an asynchronous Python library for easy synchronisation of your scripts in Git with your JSS, allowing IT admins to keep their scripts in the version control system to simplify the update process.

But positive emotions ultimately prevail over negativity.

If we evaluate the effectiveness of implementing Jamf as a Mobile Device Management solution, this solution has brought such unexpected opportunities that we had never thought of before. We have yet to realise these new horizons.

We will probably rethink our reporting capabilities based on Jamf data. We are already looking at products like Splunk, which is a log storage and analysis system. Maybe we’ll work with Chartio, an online service for data visualisation and business intelligence that is already widely used internally.

In addition, the success of the Jamf project inspired us to start the Ubuntu Landscape project, which aims to implement Mobile Device Management for the Linux platform.

Our top managers have ambitious plans for the company’s growth. Now we are fully prepared to scale – both the staff and the fleet of devices. And with Jamf, we don’t have to build up the size of our IT department.

Previous ArticleNext Article

GET TAHAWULTECH.COM IN YOUR INBOX

The free newsletter covering the top industry headlines