Pythonで文字列からアクセントを除去するには？

unicodedata.normalize('NFD', string)を使って文字を正規化し、UnicodeカテゴリがMn（非間隔記号）である文字を除去する方法が一般的です。また、unidecodeライブラリも使用できます。

unidecodeはどのように使いますか？

unidecodeは文字列をASCIIに変換するライブラリで、from unidecode import unidecodeを使ってインポートし、アクセントを削除した文字列を返します。

unicodedataを使ったアクセント除去のメリットは？

unicodedataはPython標準ライブラリで追加のインストールが不要であり、アクセントや合字を細かく管理できる点が利点です。

【Python】アクセントを除去する方法

PythonでUnicode文字列からアクセントを取り除く方法は、主にunicodedataモジュールを使って正規化を行い、Unicodeカテゴリをチェックする方法が一般的です。unicodedata.normalize('NFD', string)で文字列を正規化し、UnicodeカテゴリがMn（非間隔記号）である文字を除去することができます。

`unicodedata`を使った例

import unicodedata
def remove_accents(input_str):
    normalized_str = unicodedata.normalize('NFD', input_str)
    return ''.join(c for c in normalized_str if unicodedata.category(c) != 'Mn')

この方法では、元の文字を正規化して分解し、アクセントのないベースとなる文字だけを残します。

`unidecode`を使った方法

unidecodeライブラリは、Unicode文字列を最も近いASCII表現に変換するために使用されます。これにより、フランス語やスペイン語などのアクセント付き文字が削除されます。

from unidecode import unidecode
print(unidecode('François'))

このコードはFrançoisをFrancoisに変換します。

まとめ

アクセントの削除は、Python標準ライブラリunicodedataを使用して行う方法や、unidecodeライブラリで簡単に行うことができます。どちらの方法も用途に応じて使い分けられ、特に正規化が必要な場合にはunicodedataが便利です。

参照:

What is the best way to remove accents (normalize) in a `Python` unicode string? - Stack Overflow

unicodedataを使った例

unidecodeを使った方法

まとめ

`unicodedata`を使った例

`unidecode`を使った方法