iLocation based boolean indexing on an integer type is not available
这是pandas里一个奇葩的bug,我不知道是不是版本的问题
iLocation based boolean indexing on an integer type is not available
原始代码:
import hashlib
def test_set_check(identifier,test_ratio,hash):
return hash(np.int64(identifier)).digest()[-1]<256*test_ratio
def split_train_test_by_id(data,test_ratio,id_column,hash=hashlib.md5):
ids=data[id_column]
in_test_set=ids.apply(lambda id_:test_set_check(id_,test_ratio,hash))
#---->报错
return data.iloc[~in_test_set],data.iloc[in_test_set]
解决方法:
将mask转换为numpy.array
,即使用DataFram.values
import hashlib
def test_set_check(identifier,test_ratio,hash):
return hash(np.int64(identifier)).digest()[-1]<256*test_ratio
def split_train_test_by_id(data,test_ratio,id_column,hash=hashlib.md5):
ids=data[id_column]
in_test_set=ids.apply(lambda id_:test_set_check(id_,test_ratio,hash))
#---->修改
return data.iloc[(~in_test_set).values],data.iloc[in_test_set.values]