https://dagster.io/ logo
#random
Title
# random
c

Chris Comeau

06/15/2023, 3:04 AM
Does anyone know a good python library for finding minimal compatible types for an iterable of strings? I see options for doing this during read_csv in pandas, polars, pyarrow, duckdb, etc, but oddly can't find anything that will try to detect and improve the numeric types of an existing dataframe/series. I'm working from a one-off XML source where every value's a quoted string, and trying to generate an efficient database table schema with minimal appropriate type for each column. I might wind up making something myself, probably working with arrow tables, but I'm trying to avoid reinventing the wheel. I'm thinking that as the type detector runs, for each new value encountered, the set of compatible types may shrink as incompatible strings show up. So say your iterable starts with '1', '2'... so far unsigned 8-bit int works... then hit a '-1', so it's got to be signed... '10000', so signed 16-bit int... then '1.2', so float... then 'B' so it can't do better than string. Function short-circuits and returns the string type at this point.
c

Casper Weiss Bang

06/15/2023, 7:52 AM
worst case, if it'spython specific i can recommend looking at https://pyslackers.com/web community - it's another slack community, where people are really friendly with general python questions
c

Chris Comeau

06/15/2023, 1:02 PM
All right, thanks
2 Views