Initial Checks
- [X] I have searched Google & GitHub for similar requests and couldn't find anything
- [X] I have read and followed the docs and still think this feature is missing
Description
I'm not really sure if this is a feature request or a bug to be honest, but I'm reasonably certain it doesn't belong in the discussion forum.
I have a traditional Pydantic model that looks something like this:
from __future__ import annotations
from enum import Enum
from pydantic import BaseModel, Extra, Field
from typing import Any
class Status(Enum):
    UNKNOWN = "Unknown"
    TRUE = "True"
    FALSE = "False"
    @classmethod
    def __modify_schema__(cls, schema: dict[str, Any]) -> None:
        # Pydantic isn't able to determine this type by itself during schema generation.
        schema["type"] = "string"
class Condition(BaseModel, frozen=True, extra=Extra.forbid, allow_population_by_field_name=True):
    type_: str = Field(alias="type")
    status: ConditionStatus
Critically, it contains an enum as a subtype.
I have another class that is a custom data type. It's essentially a set dedicated to Condition objects:
from collections.abc import Iterable, Iterator, Set
from pydantic.typing import CallableGenerator
class Conditions(Set[Condition]):
    def __init__(self, conditions: Iterable[Condition]):
        self._conditions = frozenset(conditions)
    @classmethod
    def __get_validators__(cls) -> CallableGenerator:
        yield cls.validate
    @classmethod
    def validate(cls, value: Any) -> Conditions:
        if isinstance(value, Conditions):
            return value
        if not isinstance(value, Iterable):
            raise TypeError("Must provide an iterable of Condition objects.")
        def iter_check(value: Iterable[Any]) -> Iterator[Condition]:
            for i, condition in enumerate(value, start=1):
                if isinstance(condition, Condition):
                    yield condition
                else:
                    raise TypeError(f"Entry {i} is not a Condition object: {condition!r}")
        return cls(iter_check(value))
    def __iter__(self) -> Iterator[Condition]:
        return iter(self._conditions)
    def __contains__(self, value: Any) -> bool:
        return value in self._conditions
    def __len__(self) -> int:
        return len(self._conditions)
All of this works fine, except when it comes time to generate a JSON schema. For context, this Conditions class is used as the type of a field in a much bigger Pydantic model, and I'm generating the schema from that model.
The first problem is that Pydantic doesn't understand the inheritance from collections.abc.Set at all. There is no schema generated for the class by default, and in fact attempting to do so produces an error:
ValueError: Value not declarable with JSON Schema, field: name='conditions' type=Conditions required=True
Ideally Pydantic would understand that I've extended collections.abc.Set, and that the generic type (Condition) is a Pydantic model, and then it could in fact generate a full schema from that automatically. However it doesn't right now, and as support will probably never arrive in Pydantic 1.10, I need to keep searching for a solution.
The next thing to try is writing a __modify_schema__ method for the Conditions class:
@classmethod
def __modify_schema__(cls, schema: dict[str, Any]) -> None:
    schema["type"] = "array"
    schema["uniqueItems"] = True
    schema["items"] = {}
Writing just that code works, but it's sub-optimal. There's no type safety for the Condition type that we know we're wrapping. Unfortunately, I can't find a good way of handling this, only things that amount to hackz.
I attempted to replace the empty dictionary with this:
schema["items"] = Condition.schema()
It "works" in that the code runs, but the full generated schema for the overall model is invalid. The problem is that Condition.schema() lacks context of the overall schema generation operation. It generates a definitions object (and $refs pointing at the definitions object), but all of that is nested inside an object. The generated $refs point at things that don't exist.
This is where the title of this issue comes in: the schema generation isn't composable. If there was a way to tell the schema generation code "don't use a global definitions store to generate this schema", I could trivially nest one schema inside another.
Alas, this isn't possible, and attempting to make that possible would likely require a significant refactoring of the existing schema generation code, so I keep on searching. My next idea is to keep doing schema["items"] = Condition.schema(), and just fix the generated schema in the Condition class' __modify_schema__ method. This way the hackz is kept local and limited in scope, and avoids needing to reinvent the wheel wherever it may be used.
Unfortunately for whatever reason, the __modify_schema__ method of the class you call .schema() on doesn't get called. Instead, you're expected to use schema_extra in the model's config. The schema_extra config option can either be a static dictionary, or a callable. I don't really understand why the distinction is necessary. When using kwargs for model config (as opposed to the nested Config class), passing a callable just feels weird. I suspect this approach will work, though I haven't tried it yet. Getting to this point is what prompted me to create the issue.
To summarise, there are three problems here:
- Pydantic doesn't understand the meaning of a custom data type that inherits from collections.abc.Set, and isn't able to use its generic type in schema generation.
- There is no way to generate a composable schema that doesn't use $ref.
- The __modify_schema__method is not called when generating that model's schema specifically, for no discernible reason.
Please let me know if you require any further context for the code samples I've provided, or if you need any other information I might have missed :pray:
Affected Components
feature request