MySQL 8.0统计信息不准确的原因
前言
不管是Oracle还是MySQL,新版本推出的新特性,一方面给产品带来功能、性能、用户体验等方面的提升,另一方面也可能会带来一些问题,如代码bug、客户使用方法不正确引发问题等等。
案例分享
MySQL5.7下的场景
(1)首先,创建两张表,并插入数据
mysql>selectversion(); +------------+ |version()| +------------+ |5.7.30-log| +------------+ 1rowinset(0.00sec) mysql>showcreatetabletest\G ***************************1.row*************************** Table:test CreateTable:CREATETABLE`test`( `id`int(10)unsignedNOTNULLAUTO_INCREMENT, `k`int(10)unsignedNOTNULLDEFAULT'0', `c`char(120)NOTNULLDEFAULT'', `pad`char(60)NOTNULLDEFAULT'', PRIMARYKEY(`id`), KEY`k_1`(`k`) )ENGINE=InnoDBAUTO_INCREMENT=101DEFAULTCHARSET=utf8mb4MAX_ROWS=1000000 1rowinset(0.00sec) mysql>showcreatetablesbtest1\G ***************************1.row*************************** Table:sbtest1 CreateTable:CREATETABLE`sbtest1`( `id`int(10)unsignedNOTNULLAUTO_INCREMENT, `k`int(10)unsignedNOTNULLDEFAULT'0', `c`char(120)NOTNULLDEFAULT'', `pad`char(60)NOTNULLDEFAULT'', PRIMARYKEY(`id`), KEY`k_1`(`k`) )ENGINE=InnoDBAUTO_INCREMENT=1000001DEFAULTCHARSET=utf8mb4MAX_ROWS=1000000 1rowinset(0.00sec) mysql>selectcount(*)fromtest; +----------+ |count(*)| +----------+ |100| +----------+ 1rowinset(0.00sec) mysql>selectcount(*)fromsbtest1; +----------+ |count(*)| +----------+ |1000000| +----------+ 1rowinset(0.14sec)
(2)查看两张表的统计信息,均比较准确
mysql>selecttable_schema,table_name,table_rowsfromtableswheretable_name='test'; +--------------+------------+------------+ |table_schema|table_name|table_rows| +--------------+------------+------------+ |test|test|100| +--------------+------------+------------+ 1rowinset(0.00sec) mysql>selecttable_schema,table_name,table_rowsfromtableswheretable_name='sbtest1'; +--------------+------------+------------+ |table_schema|table_name|table_rows| +--------------+------------+------------+ |test|sbtest1|947263| +--------------+------------+------------+ 1rowinset(0.00sec)
(3)我们持续往test表插入1000w条记录,并再次查看统计信息,还是相对准确的,因为在默认情况下,数据变化量超过10%,就会触发统计信息更新
mysql>selectcount(*)fromtest; +----------+ |count(*)| +----------+ |10000100| +----------+ 1rowinset(1.50sec) mysql>selecttable_schema,table_name,table_rowsfromtableswheretable_name='test'; +--------------+------------+------------+ |table_schema|table_name|table_rows| +--------------+------------+------------+ |test|test|9749036| +--------------+------------+------------+ 1rowinset(0.00sec)
MySQL8.0下的场景
(1)接下来我们看看8.0下的情况吧,同样地,我们创建两张表,并插入相同记录
mysql>selectversion(); +-----------+ |version()| +-----------+ |8.0.20| +-----------+ 1rowinset(0.00sec) mysql>showcreatetabletest\G ***************************1.row*************************** Table:test CreateTable:CREATETABLE`test`( `id`intunsignedNOTNULLAUTO_INCREMENT, `k`intunsignedNOTNULLDEFAULT'0', `c`char(120)NOTNULLDEFAULT'', `pad`char(60)NOTNULLDEFAULT'', PRIMARYKEY(`id`), KEY`k_1`(`k`) )ENGINE=InnoDBAUTO_INCREMENT=101DEFAULTCHARSET=utf8mb4COLLATE=utf8mb4_0900_ai_ciMAX_ROWS=1000000 1rowinset(0.00sec) mysql>showcreatetablesbtest1\G ***************************1.row*************************** Table:sbtest1 CreateTable:CREATETABLE`sbtest1`( `id`intunsignedNOTNULLAUTO_INCREMENT, `k`intunsignedNOTNULLDEFAULT'0', `c`char(120)NOTNULLDEFAULT'', `pad`char(60)NOTNULLDEFAULT'', PRIMARYKEY(`id`), KEY`k_1`(`k`) )ENGINE=InnoDBAUTO_INCREMENT=1000001DEFAULTCHARSET=utf8mb4COLLATE=utf8mb4_0900_ai_ciMAX_ROWS=1000000 1rowinset(0.00sec) mysql>selectcount(*)fromtest; +----------+ |count(*)| +----------+ |100| +----------+ 1rowinset(0.00sec) mysql>selectcount(*)fromsbtest1; +----------+ |count(*)| +----------+ |1000000| +----------+ 1rowinset(0.02sec)
(2)查看两张表的统计信息,均比较准确
mysql>selecttable_schema,table_name,table_rowsfromtableswheretable_name='test'; +--------------+------------+------------+ |TABLE_SCHEMA|TABLE_NAME|TABLE_ROWS| +--------------+------------+------------+ |test|test|100| +--------------+------------+------------+ 1rowinset(0.00sec) mysql>selecttable_schema,table_name,table_rowsfromtableswheretable_name='sbtest1'; +--------------+------------+------------+ |TABLE_SCHEMA|TABLE_NAME|TABLE_ROWS| +--------------+------------+------------+ |test|sbtest1|947468| +--------------+------------+------------+ 1rowinset(0.01sec)
(3)同样地,我们持续往test表插入1000w条记录,并再次查看统计信息,发现table_rows显示还是100条,出现了较大偏差
mysql>selectcount(*)fromtest; +----------+ |count(*)| +----------+ |10000100| +----------+ 1rowinset(0.33sec) mysql>selecttable_schema,table_name,table_rowsfromtableswheretable_name='test'; +--------------+------------+------------+ |TABLE_SCHEMA|TABLE_NAME|TABLE_ROWS| +--------------+------------+------------+ |test|test|100| +--------------+------------+------------+ 1rowinset(0.00sec)
原因剖析
那么导致统计信息不准确的原因是什么呢?其实是MySQL8.0为了提高information_schema的查询效率,将视图tables和statistics里面的统计信息缓存起来,缓存过期时间由参数information_schema_stats_expiry决定,默认为86400s;如果想获取最新的统计信息,可以通过如下两种方式:
(1)analyzetable进行表分析
(2)设置information_schema_stats_expiry=0
继续探索
那么统计信息不准确,会带来哪些影响呢?是否会影响执行计划呢?接下来我们再次进行测试
测试1:表test记录数100,表sbtest1记录数100w
执行如下SQL,查看执行计划,走的是NLJ,小表test作为驱动表(全表扫描),大表sbtest1作为被驱动表(主键关联),执行效率很快
mysql>selectcount(*)fromtest; +----------+ |count(*)| +----------+ |100| +----------+ 1rowinset(0.00sec) mysql>selectcount(*)fromsbtest1; +----------+ |count(*)| +----------+ |1000000| +----------+ 1rowinset(0.02sec) mysql>selecttable_schema,table_name,table_rowsfromtableswheretable_name='test'; +--------------+------------+------------+ |TABLE_SCHEMA|TABLE_NAME|TABLE_ROWS| +--------------+------------+------------+ |test|test|100| +--------------+------------+------------+ 1rowinset(0.00sec) mysql>selecttable_schema,table_name,table_rowsfromtableswheretable_name='sbtest1'; +--------------+------------+------------+ |TABLE_SCHEMA|TABLE_NAME|TABLE_ROWS| +--------------+------------+------------+ |test|sbtest1|947468| +--------------+------------+------------+ 1rowinset(0.01sec) mysql>selectt.*fromtesttinnerjoinsbtest1t1ont.id=t1.idwheret.c='08566691963-88624912351-16662227201-46648573979-64646226163-77505759394-75470094713-41097360717-15161106334-50535565977'andt1.c='08566691963-88624912351-16662227201-46648573979-64646226163-77505759394-75470094713-41097360717-15161106334-50535565977'; +----+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ |id|k|c|pad| +----+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ |1|501885|08566691963-88624912351-16662227201-46648573979-64646226163-77505759394-75470094713-41097360717-15161106334-50535565977|63188288836-92351140030-06390587585-66802097351-49282961843| +----+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ 1rowinset(0.00sec) mysql>explainselectt.*fromtesttinnerjoinsbtest1t1ont.id=t1.idwheret.c='08566691963-88624912351-16662227201-46648573979-64646226163-77505759394-75470094713-41097360717-15161106334-50535565977'andt1.c='08566691963-88624912351-16662227201-4664 +----+-------------+-------+------------+--------+---------------+---------+---------+-----------+------+----------+-------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra| +----+-------------+-------+------------+--------+---------------+---------+---------+-----------+------+----------+-------------+ |1|SIMPLE|t|NULL|ALL|PRIMARY|NULL|NULL|NULL|100|10.00|Usingwhere| |1|SIMPLE|t1|NULL|eq_ref|PRIMARY|PRIMARY|4|test.t.id|1|10.00|Usingwhere| +----+-------------+-------+------------+--------+---------------+---------+---------+-----------+------+----------+-------------+ 2rowsinset,1warning(0.00sec)
测试2:表test记录数1000w左右,表sbtest1记录数100w
再次执行SQL,查看执行计划,走的也是NLJ,相对小表sbtest1作为驱动表,大表test作为被驱动表,也是正确的执行计划
mysql>selectcount(*)fromtest; +----------+ |count(*)| +----------+ |10000100| +----------+ 1rowinset(0.33sec) mysql>selectcount(*)fromsbtest1; +----------+ |count(*)| +----------+ |1000000| +----------+ 1rowinset(0.02sec) mysql>selecttable_schema,table_name,table_rowsfromtableswheretable_name='test'; +--------------+------------+------------+ |TABLE_SCHEMA|TABLE_NAME|TABLE_ROWS| +--------------+------------+------------+ |test|test|100| +--------------+------------+------------+ 1rowinset(0.00sec) mysql>selecttable_schema,table_name,table_rowsfromtableswheretable_name='sbtest1'; +--------------+------------+------------+ |TABLE_SCHEMA|TABLE_NAME|TABLE_ROWS| +--------------+------------+------------+ |test|sbtest1|947468| +--------------+------------+------------+ 1rowinset(0.01sec) mysql>selectt.*fromtesttinnerjoinsbtest1t1ont.id=t1.idwheret.c='08566691963-88624912351-16662227201-46648573979-64646226163-77505759394-75470094713-41097360717-15161106334-50535565977'andt1.c='08566691963-88624912351-16662227201-46648573979-64646226163-77505759394-75470094713-41097360717-15161106334-50535565977'; +----+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ |id|k|c|pad| +----+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ |1|501885|08566691963-88624912351-16662227201-46648573979-64646226163-77505759394-75470094713-41097360717-15161106334-50535565977|63188288836-92351140030-06390587585-66802097351-49282961843| +----+--------+-------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------+ 1rowinset(0.37sec) mysql>explainselectt.*fromtesttinnerjoinsbtest1t1ont.id=t1.idwheret.c='08566691963-88624912351-16662227201-46648573979-64646226163-77505759394-75470094713-41097360717-15161106334-50535565977'andt1.c='08566691963-88624912351-16662227201-46648573979-64646226163-77505759394-75470094713-41097360717-15161106334-50535565977'; +----+-------------+-------+------------+--------+---------------+---------+---------+------------+--------+----------+-------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra| +----+-------------+-------+------------+--------+---------------+---------+---------+------------+--------+----------+-------------+ |1|SIMPLE|t1|NULL|ALL|PRIMARY|NULL|NULL|NULL|947468|10.00|Usingwhere| |1|SIMPLE|t|NULL|eq_ref|PRIMARY|PRIMARY|4|test.t1.id|1|10.00|Usingwhere| +----+-------------+-------+------------+--------+---------------+---------+---------+------------+--------+----------+-------------+ 2rowsinset,1warning(0.01sec)
为什么优化器没有选择错误的执行计划呢?之前文章也提过,MySQL8.0是将元数据信息存放在mysql库下的数据字典表里,information_schema库只是提供相对方便的视图供用户查询,所以优化器在选择执行计划时,会从数据字典表中获取统计信息,生成正确的执行计划。
总结
MySQL8.0为了提高information_schema的查询效率,会将视图tables和statistics里面的统计信息缓存起来,缓存过期时间由参数information_schema_stats_expiry决定(建议设置该参数值为0);这可能会导致用户查询相应视图时,无法获取最新、准确的统计信息,但并不会影响执行计划的选择。
以上就是MySQL8.0统计信息不准确的原因的详细内容,更多关于MySQL8.0统计信息不准确的资料请关注毛票票其它相关文章!
声明:本文内容来源于网络,版权归原作者所有,内容由互联网用户自发贡献自行上传,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任。如果您发现有涉嫌版权的内容,欢迎发送邮件至:czq8825#qq.com(发邮件时,请将#更换为@)进行举报,并提供相关证据,一经查实,本站将立刻删除涉嫌侵权内容。