Configuration

Property name Default Meaning Since Version
spark.hyperspace.system.path “(value of spark.sql.warehouse.dir)/indexes” Root directory to store Hyperspace index files. 0.1.0
spark.hyperspace.index.numBuckets value of spark.sql.shuffle.partitions, which defaults to 200 Number of buckets to use when creating covering indexes. 0.1.0
spark.hyperspace.index.cache.expiryDurationInSeconds 300 Number of seconds since the last index modification action before index metadata cache is marked as stale. 0.1.0
spark.hyperspace.explain.displayMode “plaintext” Display mode for Hyperspace explain() output. The valid set of values is: “console”, “plaintext”, “html”. 0.1.0
spark.hyperspace.explain.displayMode.highlight.beginTag ”” (An empty string) Tag to mark beginning of highlight portion in explain() output according to the display mode. 0.1.0
spark.hyperspace.explain.displayMode.highlight.endTag ”” (An empty string) Tag to mark ending of highlight portion in explain() output according to the display mode. 0.1.0
spark.hyperspace.index.lineage.enabled false Add lineage column to index upon creation to track source data file for each index record. Lineage is required to handle deleted files in Hybrid Scan, or to refresh an index in the incremental mode. Adding lineage will increase the size of the index, proportional to the number of distinct source data files the index is built on. 0.3.0
spark.hyperspace.index.optimize.fileSizeThreshold 256MB Threshold of size of index files in bytes to optimize. Files with size below this threshold are eligible for merge during index optimization. 0.3.0
spark.hyperspace.index.hybridscan.enabled false Enable Hybrid Scan at query execution time. If enabled, Hyperspace considers an index with appended and/or deleted source data files as a candidate during query optimization, with some additional optimization overhead. 0.3.0
spark.hyperspace.index.hybridscan.maxAppendedRatio 0.3 Ratio threshold of total bytes of appended files to apply Hybrid Scan. If there’s more appended data than this threshold, Hybrid Scan won’t be applied. It’s 30% (0.3) by default. 0.4.0
spark.hyperspace.index.hybridscan.maxDeletedRatio 0.2 Ratio threshold of total bytes of deleted files to apply Hybrid Scan. If there’s more deleted data than this threshold, Hybrid Scan won’t be applied. It’s 20% (0.2) by default. Lineage is required to handle deleted files with Hybrid Scan. 0.4.0
spark.hyperspace.source.globbingPattern N/A DataFrameReader option (not a spark config) to allow indexes on globbing patterns. E.g. spark.read.option("spark.hyperspace.source.globbingPattern", "/temp/*/*").parquet("/temp/*/*") 0.4.0

Updated: