Predicate Proposal - duplicate of

Value statement: The ‘duplicate of’ predicate will be leveraged to propose duplicates in the protocol and enable verifiers to vote on whether or not two entities are duplicates of each other. Deduplication will generally improve the protocol’s data quality (to the benefit of protocol consumers and those submitting new data) and reduce gaming opportunities related to intentional duplication.

Definition: “An entity A ‘duplicate of’ another entity B when those entities A and B reference the same entity. If ‘entity A’ → ‘duplicate of’ → ‘entity B’ is accepted, entity B will remain and entity A will no longer accept new statements or allow its triples to be validated. The earliest created entity in the pair will be used as the object of this ‘duplicate of’ triple, and thus be the entity that remains in the case of a ‘duplicate of’ triple being accepted and deduplication occurring. Entities that are highly related but not direct duplicates of each other (i.e. a subsidiary company and its parent company) should not be marked as duplicates.“

Tooltip definition of the predicate: “The subject entity is a duplicate of the object entity.“

Type of value: Entity

For enums all possible values: N/A

# of accepted values: one. If an ‘duplicate of’ value is accepted for an entity, new triples where that entity is the subject will no longer be accepted.

Inverse Properties and Name: N/A

Examples of proper use:

  • “Company A” “duplicate of” “Company B” when those two companies are the same - this may be indicated by the entities have the same founders and CEO, the same employees, the same investors, the same funding rounds, etc.
  • “Person A” “duplicate of” “Person B” when those two entities are referring to the same person - this may be indicated by the two entities having the same social profiles, the same date of birth, the same work history, the same or similar locations, etc.

Examples of improper use:

  • Google” → “duplicate of” → “Google Ventures” would be incorrect, as Google Ventures is a separate subsidiary of Google and thus deserves its own entity page.
  • James Bond” → “duplicate of” → “Daniel Craig” would be incorrect, as Daniel Craig and James Bond do not represent the same entity - one is a fictional character the other is a real person. The two are only highly related, as Daniel Craig has acted as James Bond in several films.

Usage in other schemas:

  • None

Constraints: None (‘duplicate of’ should be possible to apply to any entity)

Qualifiers that apply to this predicate: N/A

Citation Required?: No

Also known as: ‘duplicates’, ‘is same as’

Suggested applicable templates on golden.com: N/A

1 Like

great proposal! how will the triples be transferred to another entity?

1 Like

In some cases, a company has been taken over and the company’s official website has been moved to that of the new company. How to be in that case? can create the essence of M&A (mergers and acquisitions?

What will be the mechanics of the merger? duplicates that have already been created?

Who will assign this entity? those. the person who created the duplicate either did not determine that the object being created is already a duplicate or did it intentionally.

Having this predicate defeats the purpose of merging duplicates.

If an entity is a duplicate, It would be better if we will just flag them as such and vote if it should be merged or not.

Hey! Sounds really innovative and might be a way out. At the same time I would like to ask a few questions:

  1. Could u please provide an example of a proper use? I mean in the current version it is a theoretical example, would be good to look at a concrete situation.

  2. The main aim of this predicate is to combat gamification of adding the same entities. However, given that any predicate is a triple by nature, users could create new duplicates and mark them withe "the duplicate of"predicate. So it could potentially lead to increase duplicates creating on purpose. Do not u regard it as a potential risk?

  3. The earliest created entity in the pair will be used as the object What if the earliest created entity is not the most filled? I mean user A creates a triple of an entity X, indicating its website and Facebook for example, and thats all. Another user B creates the same entity (let’s assume he/she did not know that it already existed) and fill in much more information. Then other users come and see that entity A is filled much less than B, so they continue to fill B (not A) because it is more complete already. So, we will face the following situation…A created earlier but poorly filled and B created later but fill much better with various triples. If we mark B with “a duplicate of” A, then A is accepted and B is automatically deleted. Consequently, we will have A entity not filled and we will need to wait that users fill it again. So I think we potentially risk loosing valuable information (in other words lots of filled triples). In this regard I would like to ask, is it possible to automatically fill entity A with the information (triples) from B and all other duplicates (if there are many of them)? Cause otherwise it could lead to a double work

1 Like

in v0 - there may be no automatic transfer of triples from the ‘deactivated’ entity (that is the object of the ‘duplicate of’ triple). Instead, this would be left as an opportunity for diligent contributors and programmatic agents to bring this data over. Automatically moving all the triples over with attribution given to the original author is an extreme constraint in the context of representing structured data on-chain

later - the dapp could allow this transferance to be done fairly easily in the UI

even later - a mechanic could give some holding period where the original author would have permission to add their statements on the ‘deactivated’ subject entity in the ‘duplicate of’ triple to the object entity

I would expect the case you describe to not be a valid use of a ‘duplicate of’ triple. Rather, original company entity A should have some connection with the new company entity B, and a valid time qualifier could be added to the website triple for the original company entity A to note that it no longer applies to it, but instead is now used to refer to company entity B

There will be edge cases here

Im not sure I understand what you mean. This ‘flagging’ feature you refer to is being modeled as a predicate given the architecture of current contracts.

In the event of a successful vote for a duplicate, what sanctions will be imposed on the creator of the duplicate?

1 Like
  1. unfortunately I cant - as an example of proper use should only be temporary as the entities themselves should be dedupped…

  2. users could create new duplicates and mark them withe "the duplicate of"predicate. So it could potentially lead to increase duplicates creating on purpose. Do not u regard it as a potential risk?

Very good point. Will need to think about the reward function here. It will likely be necessary to impose a cost to create new entities so that this strategy can’t be pursued - or in v0 to at least ensure you can’t apply the ‘duplicate of’ predicate to entities you yourself create, which would easily enable this strat

  1. What if the earliest created entity is not the most filled?

Was considering this - ideally we would let the most filled entity be the one that’s preserved, I’d agree. But the gaming consequences related to this are troublesome

  • from an already dense entity A, I could copy it to a new entity I create B, and add three triples to entity B. When entity A and entity B are dedupped, entity B would remain and I have effectively
  • for user’s looking to add data to an existing entity, if they find entities A and B exist and are duplicates, they won’t have confidence in which one they should contribute data to. If the earliest created entity is always chosen for the ‘duplicate of’ triple, users who are willing to do the work of figuring out which of entity A and B was created earliest will be rewarded by having their contribution stay on that entity indefinitely (as that entity should perpetually be the ‘surviving’ entity in the case of ‘duplicate of’ triples that involve that entity)
2 Likes

It’s a good question - in the long term, users will likely need to pay to mint the entity NFT around the entity they’re creating, so creating an entity that goes on to be deduplicated should result in the creator of that entity paying more to mint that entity than what they gain (on average) from creating that entity and adding statements to it.

We could also think about decreasing a user’s reputation score when an entity a user creates has been marked as a duplicate. We can do this with a high degree of confidence it will be ‘fair’ - the deactivated subject entity in the ‘duplicate of’ was definitely made at a time when the surviving object entity was present in the graph, and thus the creator of the deactivated subject entity can be fairly punished.

1 Like