UiPath Studio Guide

Citrix Specific Automation Techniques and Tools

Citrix Recording Wizard

The easiest way to automate in virtualized environments is using the specialized Citrix Recorder, which automatically generates fully-configured activities based on the user’s actions. It also facilitates the use of techniques such as Relative Scraping. The Recording Wizard is designed to simulate human behaviour and is specialized in using activities and technologies specific to virtual environment automation, such as OCR and Image Recognition activities.

Sometimes, the automatically generated selectors propose volatile attribute values to identify elements. This means activities might not work properly in all circumstances, and manual intervention is required to calibrate the selectors. A reliable selector should successfully identify the same element every time in all conditions, regardless of external changes in resolution or UI element position.

Opening Applications in Citrix

Usually, apps are opened by clicking their shortcut or executable file. The location of these files can normally be identified by several means, such as screen coordinates or selectors.

In virtualized environments, these ways of identifying the shortcut’s location are unavailable, so, for clicking, Image and OCR activities must be used to identify the location of the shortcut or executable file. Since these activities are based on image and text recognition, slight graphical differences, such as changes in resolution or highlighting the icon, can cause the identification of the shortcut to fail. A solution for this issue is selecting an area of the icon that does not include any portion of the background image, such as the center area of the icon.

A best practice in opening applications in virtual machine environments is creating a shortcut for the application on the desktop of the machine, assigning it a hotkey, and then sending that hotkey to the remote desktop connection window by using a Send Hotkey activity. It is recommended to use a more complex key combination for the shortcut, to avoid interfering with existing ones.

Another safe way to start apps in virtual environments is by using the Command Prompt. For example, you can send the path of the application to the Command Prompt terminal with the Send Hotkey and Type Into activities. This method also enables you to input arguments for the app to be opened.

Waiting for Certain States of Applications

There are situations when waiting for a certain state of an application is essential to creating an optimal automation. In desktop environments, UiPath activities are configured to wait for certain states before acting, as Studio has direct access to the operating system and can understand applications on a logical level.

In virtual environments, Studio does not have access to the underlying elements of the operating system, so other methods must be employed to identify application states.

To make sure an application is fully loaded before interacting with it, you must identify visual elements that show the page or app is done loading, such as specific pictures or buttons. In this regard, you can use the On Image Appear and Find Image activities to monitor the virtual environment, and allow the project to continue the execution only when a certain UI element appears. A better and more general solution is to wait for the application’s loading icon to disappear, in case it exists. An On Image Vanish activity can be used for this purpose, allowing the automation to continue only when the loading icon vanishes.

A bad practice in waiting for an application to load is adding a Delay activity to your project. This method is prone to failure because loading times for software programs can vary due to many factors.

Identifying UI Elements

Since virtual environments offer no way to identify UI elements via standard means, visual anchors are the only remaining option. UiPath Studio features activities that use OCR or Image Recognition technologies that are meant to be used in such situations.

There are several OCR engines that can be used with UiPath Studio: Google Tesseract, Microsoft MODI and Abbyy. The Google Tesseract engine works better for scraping smaller areas, while Microsoft MODI is more suitable for larger ones.

Inserting Data in Citrix

As explained earlier, clicking UI elements in virtual environments can be tricky, due to changes in resolution or background colors. Thus, inserting data in Citrix in an optimal fashion implies using methods that are not prone to failure, such as making use of keyboard shortcuts and sending hotkeys to the virtual machine window in order to avoid clicking.

Relative Click is a technique that enables you to click UI elements by using other buttons or labels around them as anchors. In situations where selectors cannot be found, the target UI objects are identified by using image recognition activities to look for adjacent visual labels or other such elements.

A good way to insert data from a machine into a virtual environment is using the shared clipboard. This method has the advantage that it can easily paste data into the virtual machine by first clicking the app to be automated and sending it the Ctrl + V hotkey.

To avoid having to identify the UI elements’ location in order to click them, it is recommended to switch between buttons and text fields by using Tab, Enter and the navigation keys. Another very useful activity for typing text in virtual machines is Type Into, because it interacts with the application by sending keystrokes, just like a human user would.


Using Tab to switch between UI elements can sometimes be unreliable, as updates that change the UI layout can cause automations to no longer function correctly. It is recommended to keep an eye out for such changes in the layout and to update the projects accordingly. Also, sending the Tab keys too fast may cause some of them to not be received by the target app, in which case it is recommended to use the Delay activity to increase the duration between sending the keys.

If using keyboard commands to navigate through UI elements is not an option, then Image and Text recognition is the alternative to automating in virtual environments. Image recognition has its own weaknesses, being sensitive to environment variations like changes in desktop theme or screen resolution. When the application runs in Citrix, the resolution should be kept greater or equal than when recording the automations. Otherwise, small image distortions can be compensated by slightly lowering the Accuracy property of the image activities. Check how the application layout adjusts itself to different resolutions to ensure visual elements’ proximity, especially in the case of coordinate based techniques like Relative Clicking and Relative Scraping. To enable the automation to support different resolutions, parallel recordings can be placed inside a Pick Branch activity and the suitable one can be automatically chosen for the optimal resolution.

Retrieving Data from Citrix

Retrieving data from a virtual environment has its own limitations, as the Native and FullText scraping methods that retrieve text directly from the operating system do not work in virtual environments. Thus, OCR-based activities are essential in scraping the screen of the Citrix machine.

Just like in the case of data input, the shared clipboard is a useful and reliable tool in retrieving text from Citrix, as it can be easily accessed by sending the Ctrl + C hotkey to the window via the Send Hotkey activity.

The Copy Selected Text activity is another activity that can copy text from the virtual machine environment, that has a very similar behaviour to using the shared clipboard.

Additionally, relative scraping is a useful technique that enables you to retrieve text from UI elements by using OCR technology, relative to anchors in the window, such as text box labels or buttons.