trajdl.datasets.open_source.hasher module#
- class trajdl.datasets.open_source.hasher.Hasher(hasher_type: str)[source]#
Bases:
object- digest_arrow(table: Table, max_chunksize: int = 8192) str[source]#
Digest a PyArrow Table and produce its hash.
- Parameters:
table (pa.Table) – The PyArrow table to be hashed.
max_chunksize (int, optional) – The maximum size of each chunk for processing (default is 8192).
- Returns:
The hexadecimal representation of the table hash.
- Return type:
str
- digest_file(path: str) str[source]#
Digest a file and produce its hash.
- Parameters:
path (str) – The path to the file to be hashed.
- Returns:
The hexadecimal representation of the file hash.
- Return type:
str
- digest_parquet(path: str, max_chunksize: int = 8192) str[source]#
Digest a Parquet file and produce its hash.
- Parameters:
path (str) – The path to the Parquet file to be hashed.
max_chunksize (int, optional) – The maximum size of each chunk for processing (default is 8192).
- Returns:
The hexadecimal representation of the Parquet file hash.
- Return type:
str