Class PDFLoader

A class that extends the BufferLoader class. It represents a document loader that loads documents from PDF files.

Example

const loader = new PDFLoader("path/to/bitcoin.pdf");
const docs = await loader.load();
console.log({ docs });

Hierarchy

BufferLoader
- PDFLoader

Index

Constructors

constructor

new PDFLoader(filePathOrBlob, __namedParameters?): PDFLoader
Parameters
- filePathOrBlob: string | Blob
- __namedParameters: {
      parsedItemSeparator: undefined | string;
      pdfjs: undefined | (() => Promise<{
          getDocument: {
              (src): PDFDocumentLoadingTask;
              (src): PDFDocumentLoadingTask;
          };
          version: string;
      }>);
      splitPages: undefined | boolean;
  } = {}
  - parsedItemSeparator: undefined | string
  - pdfjs: undefined | (() => Promise<{
        getDocument: {
            (src): PDFDocumentLoadingTask;
            (src): PDFDocumentLoadingTask;
        };
        version: string;
    }>)
  - splitPages: undefined | boolean
Returns PDFLoader
Overrides BufferLoader.constructor
- Defined in langchain/src/document_loaders/fs/pdf.ts:22

Properties

filePathOrBlob

filePathOrBlob: string | Blob

`Protected` parsedItemSeparator

parsedItemSeparator: string

Methods

load

load(): Promise<Document<Record<string, any>>[]>
Method that reads the buffer contents and metadata based on the type of filePathOrBlob, and then calls the parse() method to parse the buffer and return the documents.

Returns Promise<Document<Record<string, any>>[]>
Promise that resolves with an array of Document objects.
Inherited from BufferLoader.load
- Defined in langchain/src/document_loaders/fs/buffer.ts:36

loadAndSplit

loadAndSplit(splitter?): Promise<Document<Record<string, any>>[]>
Loads the documents and splits them using a specified text splitter.
Parameters
- splitter: TextSplitter = ...
Returns Promise<Document<Record<string, any>>[]>
A Promise that resolves with an array of Document instances, each split according to the provided TextSplitter.
Inherited from BufferLoader.loadAndSplit
- Defined in langchain/src/document_loaders/base.ts:32

parse

parse(raw, metadata): Promise<Document<Record<string, any>>[]>
A method that takes a raw buffer and metadata as parameters and returns a promise that resolves to an array of Document instances. It uses the getDocument function from the PDF.js library to load the PDF from the buffer. It then iterates over each page of the PDF, retrieves the text content using the getTextContent method, and joins the text items to form the page content. It creates a new Document instance for each page with the extracted text content and metadata, and adds it to the documents array. If splitPages is true, it returns the array of Document instances. Otherwise, if there are no documents, it returns an empty array. Otherwise, it concatenates the page content of all documents and creates a single Document instance with the concatenated content.
Parameters
- raw: Buffer
  
  The buffer to be parsed.
- metadata: Record<string, any>
  
  The metadata of the document.
Returns Promise<Document<Record<string, any>>[]>
A promise that resolves to an array of Document instances.
Overrides BufferLoader.parse
- Defined in langchain/src/document_loaders/fs/pdf.ts:53

`Static` imports

imports(): Promise<{
    readFile: {
        (path, options?): Promise<Buffer>;
        (path, options): Promise<string>;
        (path, options?): Promise<string | Buffer>;
    };
}>
Static method that imports the readFile function from the fs/promises module in Node.js. It is used to dynamically import the function when needed. If the import fails, it throws an error indicating that the fs/promises module is not available in the current environment.

Returns Promise<{
    readFile: {
        (path, options?): Promise<Buffer>;
        (path, options): Promise<string>;
        (path, options?): Promise<string | Buffer>;
    };
}>
Promise that resolves with an object containing the readFile function.
Inherited from BufferLoader.imports
- Defined in langchain/src/document_loaders/fs/buffer.ts:60

Class PDFLoader

Example

Hierarchy

Index

Constructors

Properties

Methods

Constructors

constructor

Parameters

filePathOrBlob: string | Blob

parsedItemSeparator: undefined | string

pdfjs: undefined | (() => Promise<{
    getDocument: {
        (src): PDFDocumentLoadingTask;
        (src): PDFDocumentLoadingTask;
    };
    version: string;
}>)

splitPages: undefined | boolean

Returns PDFLoader

Properties

filePathOrBlob

`Protected` parsedItemSeparator

Methods

load

Returns Promise<Document<Record<string, any>>[]>

loadAndSplit

Parameters

splitter: TextSplitter = ...

Returns Promise<Document<Record<string, any>>[]>

parse

Parameters

raw: Buffer

metadata: Record<string, any>

Returns Promise<Document<Record<string, any>>[]>

`Static` imports

Returns Promise<{
    readFile: {
        (path, options?): Promise<Buffer>;
        (path, options): Promise<string>;
        (path, options?): Promise<string | Buffer>;
    };
}>

Settings

Member Visibility

Theme

On This Page

Class PDFLoader

Example

Hierarchy

Index

Constructors

Properties

Methods

Constructors

constructor

Parameters

filePathOrBlob: string | Blob

__namedParameters: { parsedItemSeparator: undefined | string; pdfjs: undefined | (() => Promise<{ getDocument: { (src): PDFDocumentLoadingTask; (src): PDFDocumentLoadingTask; }; version: string; }>); splitPages: undefined | boolean; } = {}

parsedItemSeparator: undefined | string

pdfjs: undefined | (() => Promise<{ getDocument: { (src): PDFDocumentLoadingTask; (src): PDFDocumentLoadingTask; }; version: string; }>)

splitPages: undefined | boolean

Returns PDFLoader

Properties

filePathOrBlob

Protected parsedItemSeparator

Methods

load

Returns Promise<Document<Record<string, any>>[]>

loadAndSplit

Parameters

splitter: TextSplitter = ...

Returns Promise<Document<Record<string, any>>[]>

parse

Parameters

raw: Buffer

metadata: Record<string, any>

Returns Promise<Document<Record<string, any>>[]>

Static imports

Returns Promise<{ readFile: { (path, options?): Promise<Buffer>; (path, options): Promise<string>; (path, options?): Promise<string | Buffer>; }; }>

Settings

Member Visibility

Theme

On This Page

__namedParameters: {
parsedItemSeparator: undefined | string;
pdfjs: undefined | (() => Promise<{
getDocument: {
(src): PDFDocumentLoadingTask;
(src): PDFDocumentLoadingTask;
};
version: string;
}>);
splitPages: undefined | boolean;
} = {}

pdfjs: undefined | (() => Promise<{
getDocument: {
(src): PDFDocumentLoadingTask;
(src): PDFDocumentLoadingTask;
};
version: string;
}>)

`Protected` parsedItemSeparator

`Static` imports

Returns Promise<{
readFile: {
(path, options?): Promise<Buffer>;
(path, options): Promise<string>;
(path, options?): Promise<string | Buffer>;
};
}>