Package org.apache.iceberg.util
Class FileSystemWalker
java.lang.Object
org.apache.iceberg.util.FileSystemWalker
Utility class for recursively traversing file systems and identifying hidden paths. Provides
 methods to list files recursively while filtering out hidden paths based on specified criteria.
- 
Method SummaryModifier and TypeMethodDescriptionstatic voidlistDirRecursivelyWithFileIO(SupportsPrefixOperations io, String dir, Map<Integer, PartitionSpec> specs, Predicate<FileInfo> filter, Consumer<String> fileConsumer) Recursively lists files in the specified directory that satisfy the given conditions.static voidlistDirRecursivelyWithHadoop(String dir, Map<Integer, PartitionSpec> specs, Predicate<org.apache.hadoop.fs.FileStatus> filter, org.apache.hadoop.conf.Configuration conf, int maxDepth, int maxDirectSubDirs, Consumer<String> directoryConsumer, Consumer<String> fileConsumer) Recursively traverses the specified directory using Hadoop FileSystem API to collect file paths that meet the conditions.
- 
Method Details- 
listDirRecursivelyWithFileIOpublic static void listDirRecursivelyWithFileIO(SupportsPrefixOperations io, String dir, Map<Integer, PartitionSpec> specs, Predicate<FileInfo> filter, Consumer<String> fileConsumer) Recursively lists files in the specified directory that satisfy the given conditions. UseFileSystemWalker.PartitionAwareHiddenPathFilterto filter out hidden paths.- Parameters:
- io- FileIO implementation interface supporting prefix operations
- dir- Base directory to start recursive listing
- specs- Map of- partition specsfor this table. Used to prevent partition directories from being filtered as hidden paths.
- filter- File filter condition, only files satisfying this condition will be collected.
- fileConsumer- Consumer to accept matching file locations
 
- 
listDirRecursivelyWithHadooppublic static void listDirRecursivelyWithHadoop(String dir, Map<Integer, PartitionSpec> specs, Predicate<org.apache.hadoop.fs.FileStatus> filter, org.apache.hadoop.conf.Configuration conf, int maxDepth, int maxDirectSubDirs, Consumer<String> directoryConsumer, Consumer<String> fileConsumer) Recursively traverses the specified directory using Hadoop FileSystem API to collect file paths that meet the conditions.This method provides depth control and subdirectory quantity limitation: - Stops traversal when maximum recursion depth is reached and adds current directory to pending list
- Stops traversal when number of direct subdirectories exceeds threshold and adds subdirectories to pending list
 - Parameters:
- dir- The starting directory path to traverse
- specs- Map of- partition specsfor this table. Used to prevent * partition directories from being filtered as hidden paths.
- filter- File filter condition, only files satisfying this condition will be collected
- conf- Hadoop's configuration used to load the FileSystem
- maxDepth- Maximum recursion depth limit
- maxDirectSubDirs- Upper limit of subdirectories that can be processed directly
- directoryConsumer- Consumer for collecting parameter for storing unprocessed directory paths
- fileConsumer- Consumer for collecting qualified file paths
 
 
-