SAN FRANCISCO, Calif., June 16, 2026 — Probably has raised nine million dollars in seed funding from Andreessen Horowitz to develop systems that detect and block errors in large language system outputs before they reach users. The funding supports early product development and engineering work focused on reliability in production deployments of artificial intelligence systems.
The company is working on infrastructure that sits between the language system output and final application delivery. Rather than building another general-purpose generator, it focuses on verification and control of generated responses before they enter operational workflows.
Founder Peter Elias describes the work as building guardrails that prevent incorrect or inconsistent outputs from reaching end users. The target is higher reliability in environments where accuracy requirements are strict, and errors carry operational cost.
Factual Errors in Production Workflows
Large language systems can produce fluent responses that still contain factual errors or inconsistent information. These issues appear even in advanced systems and remain difficult to eliminate through generation alone.
In production use, these errors become more visible because outputs are often used directly in business workflows such as reporting, data extraction, or automated decision processes. Even small inconsistencies can disrupt downstream systems that rely on structured inputs.
Existing detection methods often depend on post hoc review or separate validation scripts. These methods vary widely across applications and can introduce uneven handling of generated content across different systems.
Validation Layer Between Generation and Delivery
Probably builds a validation layer that evaluates outputs before they are accepted into downstream systems. This layer checks structure, consistency, and alignment with predefined rules.
Outputs are not treated as final until they pass validation checks. If a response does not match the required structure or contains inconsistencies, it is rejected or reprocessed. This creates a controlled pathway between generation and usage.
The system is designed to reduce reliance on manual correction or application-specific parsing logic. Instead of each workflow handling errors independently, validation rules can be defined once and applied across multiple use cases.
This setup allows generated content to be constrained by deterministic rules that verify structure and correctness before delivery into production systems.
Data Science Tool and Structured Output Verification
The company’s first product is a data science tool designed to produce answers from structured datasets. Each output includes citations and an audit trail that shows how the result was derived from source data.
To reduce errors, outputs pass through a validation system that checks whether results match dataset constraints. If mismatches occur, the system rejects the output and reprocesses it until it satisfies the required rules.
This method is described internally as a “data science mech suit,” referring to the way validation rules wrap around model outputs and constrain behavior. The system also aligns generation behavior with the same rules used for verification.
The company reports that stronger validation can reduce dependence on highly capable models. In some configurations, smaller models can be used while still maintaining reliable output because validation handles correctness enforcement.
This also reduces compute requirements and lowers token costs, since smaller models can run on local hardware rather than large cloud infrastructure.
Enterprise Use Cases and Structured Reliability
The same validation architecture is intended for use beyond data science workflows. Potential applications include accounting systems, compliance tools, and healthcare workflows where structured outputs and traceability are required.
In these settings, even minor errors can lead to operational disruption. Validation systems reduce that risk by checking outputs before they are accepted into downstream processes.
Probably treats model output as untrusted until it passes verification rules. Each response must satisfy structure and consistency checks before it enters production systems.
Founder Peter Elias has said that major AI providers have not prioritised this type of constraint-based reliability system, noting that incentives often favour repeated usage rather than reducing correction cycles.
The company treats generated output as untrusted data until it passes validation rules. This principle guides how the product evaluates, filters, and standardizes responses before they reach downstream systems.