Crawling remote Windows computers with Cognitive Toolkit
Shinydocs Cognitive Toolkit was designed to run in a server environment due to the heavy amount of processing power required for text extraction, generating hash values, OCR, etc. and requires access to all data to be analyzed. What if you want to crawl data on users’ computers?
Typically for processes like this, you would deploy the app to the computers using your organization's standard software deployment methods. While this may work for typical applications, text extraction and other compute-heavy processes make the deployment approach less appetizing. Processes like text extraction will typically use all CPU resources available to perform the text extraction, leaving your user’s machine slow and sometimes inoperable while it is processing all that text data. This is without mentioning the firewall rule changes on each machine, scheduling, retrying, and others that will make this approach even less palatable.
How do you index data from users’ machines then? By leveraging Windows built-in administrator shares! This is something your organization likely already uses to manage user’s PCs, it is a Windows feature that allows remote access to remote PCs on your domain/network using a UNC path like this:
\\<machine_name>\<drive_letter>$
\\laptop-1\c$
You may need to check with your organization’s IT team about enabling this feature, but it is typically enabled by default in domain environments. This will enable you to crawl these machines remotely, as long as the machine is powered on and connected to the network and domain.
You can use your Active Directory management to find the names of the machines registered on your domain to generate this list of machine names. Once you have that list, you will need to modify it so that they are UNC paths with the administrator shares:
List of machines | List of machines with UNC administrator share |
---|---|
ACMEUSR0001 | \\ACMEUSR0001\c$ |
ACMEUSR0002 | \\ACMEUSR0002\c$ |
ACMEUSR0003 | \\ACMEUSR0003\c$ |
This list of UNC paths can be used directly with the Cognitive Toolkit’s CrawlFileSystem tool using the --path-file
argument.
Are you a Shinydocs Discovery Search customer? This is the required method for enabling users to search their own machines with Discovery Search.