cpython中的PyObject等对象入门

参考资料

参考资料如下:

1. cpython中对象的定义

在python中一切皆对象,所以有关对象的定义是比较重要的,本文我们从cpython的源码中介绍cpython中基本的对象结构。

说明:下面所有的代码都摘录自: cpython源码中3.8分支的代码; 不同分支中的c代码实现可能不同

在cpython中有关对象的定义主要是在两个文件中:

  • 头文件: Include/object.h
  • 源文件: Objects/object.c

1.1. cpython中对象的整体结构

我们打开头文件Include/object.h,在前面部分可以看到整体的介绍,我将其摘录下来:

Objects are structures allocated on the heap. Special rules apply to the use of objects to ensure they are properly garbage-collected. Objects are never allocated statically or on the stack; they must be accessed through special macros and functions only. (Type objects are exceptions to the first rule; the standard types are represented by statically initialized type objects, although work on type/class unification for Python 2.2 made it possible to have heap-allocated type objects too).

An object has a ‘reference count’ that is increased or decreased when a pointer to the object is copied or deleted; when the reference count reaches zero there are no references to the object left and it can be removed from the heap.

An object has a ‘type’ that determines what it represents and what kind of data it contains. An object’s type is fixed when it is created. Types themselves are represented as objects; an object contains a pointer to the corresponding type object. The type itself has a type pointer pointing to the object representing the type ‘type’, which contains a pointer to itself!.

Objects do not float around in memory; once allocated an object keeps the same size and address. Objects that must hold variable-size data can contain pointers to variable-size parts of the object. Not all objects of the same type have the same size; but the size cannot change after allocation. (These restrictions are made so a reference to an object can be simply a pointer – moving an object would require updating all the pointers, and changing an object’s size would require moving it if there was another object right next to it.)

Objects are always accessed through pointers of the type PyObject *. The type PyObject is a structure that only contains the reference count and the type pointer. The actual memory allocated for an object contains other data that can only be accessed after casting the pointer to a pointer to a longer structure type. This longer type must start with the reference count and type fields; the macro PyObject_HEAD should be used for this (to accommodate for future changes). The implementation of a particular object type can cast the object pointer to the proper type and back.

A standard interface exists for objects that contain an array of items whose size is determined when the object is allocated.

摘录其中的关键点:

  1. 对象(Objects)是堆(heap)上分配的结构
  2. 对象从不静态(statically)分配或在栈(stack)上分配;它必须是通过特殊宏和函数访问。
  3. 当拷贝或删除指向对象的指针时,对象会增加或减少它的“引用计数(reference count)”;
  4. 每个对象有一个“类型”,它决定它表示什么以及包含什么类型的数据。
  5. 对象在内存中的位置是固定的,一旦被分配,对象就保持相同的大小和地址。对于可变长度的对象是通过一个指向对象的可变大小部分的指针来实现的。
  6. cpython中始终通过PyObject*类型的指针来访问对象, 类型PyObject是一个仅包含引用计数和类型指针(type pointer)的结构.

1.2. 介绍最重要的对象PyObject

从上面的介绍中可以看到对象PyObject特别重要, 它定义在文件Include/object.h中, 如下:

/* Nothing is actually declared to be a PyObject, but every pointer to
 * a Python object can be cast to a PyObject*.  This is inheritance built
 * by hand.  Similarly every pointer to a variable-size Python object can,
 * in addition, be cast to PyVarObject*.
 */
typedef struct _object {
    _PyObject_HEAD_EXTRA
    Py_ssize_t ob_refcnt;
    struct _typeobject *ob_type;
} PyObject;

解释:

  1. _PyObject_HEAD_EXTRA: 只会在启用Py_TRACE_REFS后才生效,正常运行一般不会到,不用过于关注
  2. ob_refcnt: 当前对象的引用计数,不用太在意它的类型
  3. *ob_type: 指向类型对象(请一定要搞清楚类型对象与实例对象)的指针,这个类型特别重要,定义在Include/cpython/object.h中。

说明:

由于struct _typeobject极其重要,但是特别长,所以放在后面单独介绍。

1.3. 介绍常用的对象PyVarObject

在python中除了基础的对象PyObject外,还有在其基础上扩展的对象PyVarObject,因为我们常用到的list, dict等对象的长度是可以随时变化的。PyVarObject对象的定义如下:

typedef struct {
    PyObject ob_base;
    Py_ssize_t ob_size; /* Number of items in variable part */
} PyVarObject;

解释:

  1. 可以看到它完成继承了对象PyObject的结构,并命名为ob_base
  2. 然后新增了一个ob_size成员,用来表达可变对象中可变元素的个数,对于list对象而言,它表示的就是list中元素的个数。

2. 介绍_typeobjectPyTypeObject

每一个类型对象(注意这里说的不是说实例对象)都有一个类型,那么对象的类型的数据结构就极其的重要,它定义在文件Include/cpython/object.h中的结构体_typeobject中,同时它有一个别名PyTypeObject, 定义如下:

typedef struct _typeobject {
    PyObject_VAR_HEAD
    const char *tp_name; /* For printing, in format "<module>.<name>" */
    Py_ssize_t tp_basicsize, tp_itemsize; /* For allocation */

    /* Methods to implement standard operations */

    destructor tp_dealloc;
    Py_ssize_t tp_vectorcall_offset;
    getattrfunc tp_getattr;
    setattrfunc tp_setattr;
    PyAsyncMethods *tp_as_async; /* formerly known as tp_compare (Python 2)
                                    or tp_reserved (Python 3) */
    reprfunc tp_repr;

    /* Method suites for standard classes */

    PyNumberMethods *tp_as_number;
    PySequenceMethods *tp_as_sequence;
    PyMappingMethods *tp_as_mapping;

    /* More standard operations (here for binary compatibility) */

    hashfunc tp_hash;
    ternaryfunc tp_call;
    reprfunc tp_str;
    getattrofunc tp_getattro;
    setattrofunc tp_setattro;

    /* Functions to access object as input/output buffer */
    PyBufferProcs *tp_as_buffer;

    /* Flags to define presence of optional/expanded features */
    unsigned long tp_flags;

    const char *tp_doc; /* Documentation string */

    /* Assigned meaning in release 2.0 */
    /* call function for all accessible objects */
    traverseproc tp_traverse;

    /* delete references to contained objects */
    inquiry tp_clear;

    /* Assigned meaning in release 2.1 */
    /* rich comparisons */
    richcmpfunc tp_richcompare;

    /* weak reference enabler */
    Py_ssize_t tp_weaklistoffset;

    /* Iterators */
    getiterfunc tp_iter;
    iternextfunc tp_iternext;

    /* Attribute descriptor and subclassing stuff */
    struct PyMethodDef *tp_methods;
    struct PyMemberDef *tp_members;
    struct PyGetSetDef *tp_getset;
    struct _typeobject *tp_base;
    PyObject *tp_dict;
    descrgetfunc tp_descr_get;
    descrsetfunc tp_descr_set;
    Py_ssize_t tp_dictoffset;
    initproc tp_init;
    allocfunc tp_alloc;
    newfunc tp_new;
    freefunc tp_free; /* Low-level free-memory routine */
    inquiry tp_is_gc; /* For PyObject_IS_GC */
    PyObject *tp_bases;
    PyObject *tp_mro; /* method resolution order */
    PyObject *tp_cache;
    PyObject *tp_subclasses;
    PyObject *tp_weaklist;
    destructor tp_del;

    /* Type attribute cache version tag. Added in version 2.6 */
    unsigned int tp_version_tag;

    destructor tp_finalize;
    vectorcallfunc tp_vectorcall;

    /* bpo-37250: kept for backwards compatibility in CPython 3.8 only */
    Py_DEPRECATED(3.8) int (*tp_print)(PyObject *, FILE *, int);

#ifdef COUNT_ALLOCS
    /* these must be last and never explicitly initialized */
    Py_ssize_t tp_allocs;
    Py_ssize_t tp_frees;
    Py_ssize_t tp_maxalloc;
    struct _typeobject *tp_prev;
    struct _typeobject *tp_next;
#endif
} PyTypeObject;

可以看到比较复杂,我们下面慢慢介绍。

2.2. PyTypeObject 的整体结构

我们可以简单的将PyTypeObject分为几部分:

  1. 第一部分: PyObject_VAR_HEAD 重要的宏,后面介绍
  2. 第二部分: tp_name, 类型名称(也就是 type name),主要是python内部或者是调试时候使用
  3. 第三部分: tp_basicsize, tp_itemsize: 占用内存空间的大小
  4. 第四部分: 其他信息, 这部分比较复杂,略。

2.3. 非常重要的宏 PyObject_VAR_HEAD

上面的代码中涉及到的第一个关键的宏PyObject_VAR_HEAD的定义在文件Include/object.h中:

/* PyObject_VAR_HEAD defines the initial segment of all variable-size
 * container objects.  These end with a declaration of an array with 1
 * element, but enough space is malloc'ed so that the array actually
 * has room for ob_size elements.  Note that ob_size is an element count,
 * not necessarily a byte count.
 */
#define PyObject_VAR_HEAD      PyVarObject ob_base;

也就是说PyObject_VAR_HEAD 就是一个PyVarObject, 可以简单理解为:

// 切记:这是伪代码,只是为了便于理解,不是真正的代码
struct _PyVarObject {
    _PyObject_HEAD_EXTRA  // 不用关注,正常的运行中很少使用
    Py_ssize_t ob_refcnt;  // 引用计数
    struct _typeobject *ob_type;  // 类型对象的指针
    Py_ssize_t ob_size;  // 可变对象的长度(比例list中的元素的个数)
} PyObject_VAR_HEAD;

版权声明:本文为u013391094原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
THE END
< <上一篇
下一篇>>