Proposal to extend the "Citation" API object with more details

I’d like to propose extending the “Citation” API object with detailed information that can help confirm the claim that this citation conveys. From browsing the API, it looks like at the moment, when I’m creating a Citation object, I can only pass the URL that confirms this claim. In the UI, when I’m adding a citation, it looks like I can provide more detailed information by selecting the type of citation, such as web page, or a news article etc. It seems that it is creating a “Source” object at the time that I don’t see in the API, so there’s a chance that what I’m about to propose already exists out there.

I think it would be useful to let API users pass detailed predicate extraction definitions for website Sources. This extraction definitions should present the list of steps necessary to gather information necessary to confirm a predicate. It could be as simple as “Step 1: Open a URL, Step2: Extract element from page”, or much more elaborate.

I think it would be useful to follow some open definition for such steps. Personally I think [Playwright](https://playwright.dev/) test case definitions could be a good starting point for such definitions. That way API users could provide web automation scripts that evaluate to expected predicate value. If the knowledge graph can be built by automated agents, such agents should be able to provide a detailed description of how they derived this knowledge.

Playwright tests are interesting choice here for two reasons:

  • The test case definition language that allows building complex web automation configurations.
  • The [Playwright TraceViewer](https://playwright.dev/docs/trace-viewer) allows recording all the tracing information helping confirm the state of the website of the execution of this trace, and it allows visual playback of this tracing information. Such a replay could be presented to validators as a means of confirming the triple.

Having such detailed information also has an advantage of being able to detect when a citation stops evaluating to the value of a predicate, potentially giving a signal about potential changes in the knowledge graph. This signal could be a valuable information for future agents to act upon.

1 Like