iLocation based boolean indexing on an integer type is not available

in 机器学习 with 0 comment

这是pandas里一个奇葩的bug,我不知道是不是版本的问题

iLocation based boolean indexing on an integer type is not available

原始代码:

import hashlib
def test_set_check(identifier,test_ratio,hash):
    return hash(np.int64(identifier)).digest()[-1]<256*test_ratio

def split_train_test_by_id(data,test_ratio,id_column,hash=hashlib.md5):
    ids=data[id_column]
    in_test_set=ids.apply(lambda id_:test_set_check(id_,test_ratio,hash))
    #---->报错
    return data.iloc[~in_test_set],data.iloc[in_test_set]

解决方法: 将mask转换为numpy.array,即使用DataFram.values

import hashlib
def test_set_check(identifier,test_ratio,hash):
    return hash(np.int64(identifier)).digest()[-1]<256*test_ratio

def split_train_test_by_id(data,test_ratio,id_column,hash=hashlib.md5):
    ids=data[id_column]
    in_test_set=ids.apply(lambda id_:test_set_check(id_,test_ratio,hash))

    #---->修改
    return data.iloc[(~in_test_set).values],data.iloc[in_test_set.values]