The other day I found a file with a peculiar format and I needed to find it again. There are two problems I've encountered:

  1. Size: It's one of 20,000,000 XML files
  2. Search: I need to find a file containing a string "xocs:doi", which is not interpreted as a single word (due to the colon character, of course)

My case is search through XML files, but I've figured the method is generic enough for any text files.

TL;DR: Use the power of findstr, Luke!

First Option - Windows Explorer

Searching through windows explorer is trivial; you just go to the search bar and type your string. If you want to searhc for a phrase, you just type it in double quotes like so: "hello world"..and click the search button.

However, the behaviour is inconsistent:

  • Indexed directories: the result is a list of files containing the exact phrase you wanted to search
  • Un-indexed directories: the result is a list of files containing all words in the phrase

Example:

Assuming we have two files in the same directory:

test1.txt

hello world

and

test2.txt

hello dear world

We type the text "hello world" (including the quotes) and launch the search in Explorer.

  • If the directory is indexed, the result will contain only test1.txt
  • If the directory has no index, the result will have both test1.txt and test2.txt

In my case, indexing 20M files is out of the question (time and disk space constraints), so searching via explorer yields (a lot of) false positives.

Command line

Windows has a findstr command one can use to find strings in multiple files. It's somewhat like find/grep in UN*X. Have look at its documentation.

My command line to find the string above through all files is:

findstr /s /m /c:"xocs:doi" *.*

What it does is:

  • /s - look through all subdirectories
  • /m - display only the file name containing the match
  • /c:"xocs:doi" - look for the exact string
  • . - check all files

Conclusion

The most accurate method to look for exact arbitrary strings in windows is via findstr command line.