[A-00092]pythonでDocument AI API(GCP)を使用する

lofichilladmin 公開日: 2023年7月9日カテゴリー: GCP、Python コメントはまだありませんタグ: API、document ai、gcp、OCR、pdf、python、スキャン、スクリーニング、帳票、読み取り

Google Cloudが提供するDocument AI APIをpythonから使用する方法を記載する。

公式ドキュメントはこちら

https://cloud.google.com/document-ai/docs/libraries

・セットアップ

pythonのライブラリをインストールします。

pip install --upgrade google-cloud-documentai

・Document AIを使用する前にProcessor-Typeを調べる

Document AIを使用する前に自分の使用するProcessor Typeを決める必要があります。

Processor-Typeを調べる方法は下記のとおりです。

from google.api_core.client_options import ClientOptions
from google.cloud import documentai  # type: ignore


if __name__ == "__main__":
    client = documentai.DocumentProcessorServiceClient()
    # Initialize request argument(s)
    request = documentai.FetchProcessorTypesRequest(
        parent="projects/<project_id>/locations/us",
    )

    # Make the request
    response = client.fetch_processor_types(request=request)
    print(response)

上記で取得したProcessor-Typeから使用するドキュメントに合わせて選択して下さい。

今回の場合はPDFファイルなので[OCR_PROCESSOR]で読み取りします。

・ドキュメントを読み込む(新規Processorを作成する)

使用するドキュメントを下記のように適当なローカルフォルダに格納してください。

今回テスト用で使用するファイルは下記です。

test ダウンロード

下記のようにローカルの適当なディレクトリに格納してください。

[A-00092]pythonでDocument AI API(GCP)を使用する

・セットアップ

・Document AIを使用する前にProcessor-Typeを調べる

・ドキュメントを読み込む(新規Processorを作成する)

・カスタムプロセッサのAPIでドキュメントを分析する。

コメントを残すコメントをキャンセル

[A-00092]pythonでDocument AI API(GCP)を使用する

・セットアップ

・Document AIを使用する前にProcessor-Typeを調べる

・ドキュメントを読み込む(新規Processorを作成する)

・カスタムプロセッサのAPIでドキュメントを分析する。

コメントを残す コメントをキャンセル

コメントを残すコメントをキャンセル