File search algorithms


















Stack Overflow for Teams — Collaborate and share knowledge with a private group. Create a free Team What is Teams? Collectives on Stack Overflow. Learn more. Better file search algorithm than creating a list of files Ask Question. Asked 8 years, 7 months ago. Active 6 years, 9 months ago. Viewed 7k times.

For a project I'm doing I made a java program that searches for a file specified by user input. Raedwald Austin Austin 1 1 silver badge 7 7 bronze badges. If you are using Java7, Files. I think there are two ways to do it: 1. Search the file hierarchy linearly "on the go" which is the way you are doing it , or 2. Load the data in a data structure like a binary tree and search the data there. The downside of the first approach is that it can take a long time to traverse the entire hierarchy, but you only do it once but you do it once for every search.

The downside of the second approach is that it can take a long time to load the full hierarchy in the data structure, but you can search many times the data in the structure although you need to update it periodically. What you're doing is a breadth-first search, searching directories immediately would be a depth-first search, neither is better, just how you want your tree searched.

Add a comment. Active Oldest Votes. Muzaffer Muzaffer 1, 1 1 gold badge 13 13 silver badges 22 22 bronze badges. There is also Iterative Deepening Search. We find that the value at location 4 is 27, which is not a match.

As the value is greater than 27 and we have a sorted array, so we also know that the target value must be in the upper portion of the array. The value stored at location 7 is not a match, rather it is more than what we are looking for. So, the value must be in the lower part from this location. Binary search halves the searchable items and thus reduces the count of comparisons to be made to very less numbers.

To know about binary search implementation using array in C programming language, please click here. The filename is the minimum amount of information, and the full relative-path is the maximum amount of information that can be used to locate a given document. The relative-path-suffixes are somewhere between these extremes. Specifying the minimal amount of information to disambiguate a given search makes the documentation more robust to directory changes. I know how to write a script if I'm looking for a number of files in a number of directories.

However, I need some ideas with the most efficient algorithm so I don't have to start a search which might take a month! Let's say I have about filenames in a list AND I have to search N number of SANS storage shares having hundreds of thousands of files each to determine if any folder contains a file named to match one in my list.

That is, someone may have copied a couple of the files or all of them to another folder. They may have sprinkled any subset of the files anywhere on the shares. Therefore, I have to search the entire tree on each share and determine if the path ends with the filename and, if so, delete it. I may or may not clean up empty folders; it's not a requirement.

I don't want anyone to write a script for me; I'd like to discuss algorithms and the appropriate data structures for something this large. I'm sure there are many ways to do this and many object classes which I'm not even aware of. Thanks for your thoughts.



0コメント

  • 1000 / 1000