问题描述
问题分为两部分.如何检查我的数据库中缺少哪些工作日,如果缺少某些工作日,则添加它们并用最近日期的值填充该行.
The problem splits into two parts. How to check which working days are missing from my database, if some are missing then add them and fill the row with the values from the closest date.
第一部分,检查并找到日期.我应该使用下面示例中的间隙方法吗?
First part, check and find the days. Should i use a gap approach like in the example below?
SELECT t1.col1 AS startOfGap, MIN(t2.col1) AS endOfGap FROM (SELECT col1 = theDate + 1 FROM sampleDates tbl1 WHERE NOT EXISTS(SELECT * FROM sampleDates tbl2 WHERE tbl2.theDate = tbl1.theDate + 1) AND theDate <> (SELECT MAX(theDate) FROM sampleDates)) t1 INNER JOIN (SELECT col1 = theDate - 1 FROM sampleDates tbl1 WHERE NOT EXISTS(SELECT * FROM sampleDates tbl2 WHERE tbl1.theDate = tbl2.theDate + 1) AND theDate <> (SELECT MIN(theDate) FROM sampleDates)) t2 ON t1.col1 <= t2.col1 GROUP BY t1.col1;
然后我需要查看哪个日期与我丢失的日期最接近,并用最近的值填充新插入的日期(丢失的日期).前段时间,我想出了一些方法来从一行中获得最接近的值,但这次我需要调整它以检查向下和向上.
Then i need to see which is the closest date to the one i was missing and fill the new inserted date (the one which was missing) with the values from the closest. Some time ago, I came up with something to get the closest value from a row, but this time i need to adapt it to check both down and upwards.
SELECT t,A, C,Y, COALESCE(Y, (SELECT TOP (1) Y FROM tableT AS p2 WHERE p2.Y IS NOT NULL AND p2.[t] <= p.[t] and p.C = p2.C ORDER BY p2.[t] DESC)) as 'YNew' FROM tableT AS p order by c, t
如何将它们合二为一?
谢谢
预期结果
Date 1mA 20.12.2012 0.152 21.12.2012 0.181 22 weekend so it's skipped (they are skipped automatically) 23 weekend -,- 24 missing 25 missing 26 missing 27.12.2012 0.173 28.12.2012 0.342 Date 1mA 20.12.2012 0.152 21.12.2012 0.181 22 weekend so it's skipped (they are skipped automatically) 23 weekend 0.181 24 missing 0.181 25 missing 0.181 26 missing 0.173 27.12.2012 0.173 28.12.2012 0.342
因此,24,25,26 甚至不存在空值.他们根本不在那里.
So, 24,25,26 are not even there with null values. They are simply not there.
编辑 2:为了取最接近的值,让我们考虑一下我一直在上面看的场景.所以当它丢失时总是返回 1.
EDIT 2: For taking the closest value, let's consider the scenario in which i'm always looking above. So always going back 1 when it's missing.
Date 1mA 20.12.2012 0.152 21.12.2012 0.181 22 weekend so it's skipped (they are skipped automatically) 23 weekend 0.181 24 missing 0.181 25 missing 0.181 26 missing 0.181 27.12.2012 0.173 28.12.2012 0.342
推荐答案
对于这些类型的查询,您可以通过创建包含您需要测试的每个日期的日历表获得显着的性能优势.(如果您熟悉术语维度表",这只是一个用于枚举每个感兴趣日期的表格.)
For these types of query you gain significant performance benefits from creating a calendar table containing every date you'll ever need to test. (If you're familiar with the term "dimension tables", this is just one such table to enumerate every date of interest.)
此外,整个查询可以变得非常简单.
Also, the query as a whole can become significantly simpler.
SELECT cal.calendar_date AS data_date, CASE WHEN prev_data.gap <= next_data.gap THEN prev_data.data_value ELSE COALESCE(next_data.data_value, prev_data.data_value) END AS data_value FROM calendar AS cal OUTER APPLY ( SELECT TOP(1) data_date, data_value, DATEDIFF(DAY, data_date, cal.calendar_date) AS gap FROM data_table WHERE data_date <= cal.calendar_date ORDER BY data_date DESC ) prev_data OUTER APPLY ( SELECT TOP(1) data_date, data_value, DATEDIFF(DAY, cal.calendar_date, data_date) AS gap FROM data_table WHERE data_date > cal.calendar_date ORDER BY data_date ASC ) next_data WHERE cal.calendar_date BETWEEN '2015-01-01' AND '2015-12-31' ;
编辑以不同的要求回复您的评论
EDIT Reply to your comment with a different requirement
始终获得上面的值"更容易,并且将这些值插入到表中也很容易...
To always get "the value above" is easier, and to insert those values in to a table is easy enough...
INSERT INTO data_table SELECT cal.calendar_date, prev_data.data_value FROM calendar AS cal CROSS APPLY ( SELECT TOP(1) data_date, data_value FROM data_table WHERE data_date <= cal.calendar_date ORDER BY data_date DESC ) prev_data WHERE cal.calendar_date BETWEEN '2015-01-01' AND '2015-12-31' AND cal.calendar_date <> prev_data.data_date ;
注意:您可以添加 WHERE prev_data.gap >0 到上面更大的查询,只获取没有数据的日期.
Note: You could add WHERE prev_data.gap > 0 to the bigger query above to only get dates that don't already have data.