Skip to main content

Documentation Index

Fetch the complete documentation index at: https://demircancelebi.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

The data quality endpoint generates a structured report that surfaces documents and mappings that may require review or re-processing. The report is organized into named queues, each highlighting a different type of quality issue: documents with unusually few extracted line items, portfolio-only documents, documents with no extracted data, mappings flagged for review, numeric parse failures, and documents whose source PDFs are not available. You can scope the report to a single fund or review the entire dataset at once.

Endpoints

MethodPathDescription
GET/data-qualityReturns a structured data quality report
Base URL: https://mkk-roan.vercel.app/api

GET /data-quality

Returns a DataQualityReport with aggregate summary statistics and per-queue lists of problematic documents or mappings.

Query parameters

fund_id
string
Scope the report to the fund with this internal database ID. Omit to report across all funds.
fund_code
string
Scope the report to the fund with this fund code (e.g., OJB). Use instead of fund_id when you know the fund code.
limit
integer
default:"50"
Maximum number of items to return in each queue list (e.g., low_line_item_documents, empty_documents). Accepts values from 1 to 500.
low_line_item_threshold
integer
default:"10"
Documents with fewer extracted line items than this threshold are added to the low_line_item_documents queue. Accepts values from 0 to 100. Lower this value to reduce false positives for funds with naturally sparse data.

Response schema

scope
object
required
The filters applied to generate this report.
limits
object
required
The effective limits used to generate the report.
summary
DataQualitySummary
required
Aggregate counts across all quality queues.
mapping_methods
MappingMethodSummary[]
required
Breakdown of line item mapping methods used across documents in scope.
low_line_item_documents
object[]
required
Documents with fewer extracted line items than the low_line_item_threshold. Each item includes document metadata and the actual line_item_count.
portfolio_only_documents
object[]
required
Documents that contain portfolio entries but have no extracted line item values. These may indicate parsing failures for the financial statement pages.
empty_documents
object[]
required
Documents with no extracted line item values and no portfolio entries. These are candidates for re-processing.
review_mappings
object[]
required
Line item mappings flagged for human review, typically because mapping confidence is below an internal threshold or the mapping method is fuzzy or model.
numeric_parse_failures
object[]
required
Line item values where the value string could not be parsed into a numeric_value. Each item includes the raw value string and its source document.
portfolio_numeric_parse_failures
object[]
required
Portfolio entries where one or more numeric fields (market value, nominal value, etc.) could not be parsed. Each item includes the raw field values and source document.
missing_pdfs
object[]
required
Documents for which no PDF binary is available in storage. The document has been indexed but the source file cannot be streamed.
Start your quality review with the summary object. If empty_count or low_line_item_count is high, lower low_line_item_threshold and re-run to see which documents are affected. Use review_mappings to find line item values that need manual verification.
Queue lists are capped at the limit parameter. The summary counts always reflect the total across all documents in scope, not just the items returned in the lists.

Error responses

StatusDescription
400Invalid parameter value (e.g., limit out of range or low_line_item_threshold outside 0100).

Example requests

curl https://mkk-roan.vercel.app/api/data-quality

Example response

200
{
  "scope": {
    "fund_id": null
  },
  "limits": {
    "list_limit": 50,
    "low_line_item_threshold": 10
  },
  "summary": {
    "documents": 4821,
    "documents_with_line_items": 4650,
    "documents_with_portfolio": 4120,
    "documents_with_both": 3980,
    "portfolio_only_documents": 28,
    "empty_documents": 7,
    "low_line_item_documents": 143,
    "review_mappings": 312,
    "numeric_parse_failures": 89,
    "portfolio_numeric_parse_failures": 241,
    "missing_pdfs": 14
  },
  "mapping_methods": [
    {
      "mapping_method": "exact",
      "count": 820114,
      "average_confidence": 1.0
    },
    {
      "mapping_method": "fuzzy",
      "count": 48210,
      "average_confidence": 0.87
    },
    {
      "mapping_method": "manual",
      "count": 12840,
      "average_confidence": 1.0
    },
    {
      "mapping_method": "model",
      "count": 10266,
      "average_confidence": 0.79
    }
  ],
  "low_line_item_documents": [
    {
      "id": 2041,
      "fund_code": "AKB",
      "period": "2022-03",
      "line_item_count": 4,
      "file_name": "AKB_202203.pdf"
    }
  ],
  "portfolio_only_documents": [
    {
      "id": 3118,
      "fund_code": "YKB",
      "period": "2021-09",
      "file_name": "YKB_202109.pdf"
    }
  ],
  "empty_documents": [],
  "review_mappings": [
    {
      "id": "lv_08841",
      "document_id": 1042,
      "fund_code": "OJB",
      "period": "2023-06",
      "raw_label": "Toplam Fonun Net Varlik Degeri",
      "line_item_slug": "total-net-assets",
      "mapping_method": "fuzzy",
      "mapping_confidence": 0.81
    }
  ],
  "numeric_parse_failures": [
    {
      "id": "lv_09912",
      "document_id": 2301,
      "fund_code": "AKB",
      "period": "2022-07",
      "line_item_slug": "unit-price",
      "raw_value": "1.234,56*"
    }
  ],
  "portfolio_numeric_parse_failures": [],
  "missing_pdfs": [
    {
      "id": 188,
      "fund_code": "ESK",
      "period": "2019-12",
      "file_name": "ESK_201912.pdf"
    }
  ]
}