News:

The AARoads Wiki is live! Come check it out!

Main Menu

Excel Question

Started by webny99, September 08, 2023, 06:40:57 PM

Previous topic - Next topic

webny99

I think I recall that a number of users here are familiar with Excel, so I figured I'd see if anyone has any solutions to a problem I'm having.

In trying to compile and analyze Thruway AADT data, I've discovered that all zero data points do not exist in the data. In other words, if 0 vehicles were recorded in a given lane in a given hour, that hour is entirely skipped from the data, as if it did not exist. This is a problem because in order to interpret the data, I've been using the =WRAPCOLS function to organize it in columns of 24 (hours per day) to obtain daily volumes, and this obviously does not work properly if there are missing data points. Even one missing point throws it off, and searching for such among a year's worth of data (8064 data points) is like searching for a needle in a haystack, but even more tedious.

So, I have two columns of data, Column 1 has the date and hour in [MM/DD/YYYY XX:00] format, and Column 2 has the traffic count value for that hour. Is there any function I can use to add rows for all of the missing [MM/DD/YYYY XX:00] values in column 1 and autofill those rows with zeroes in column 2?

Google has been no help, and a solution could save me upwards of an hour x ~15 remaining datasets, so any advice is much appreciated. Thanks!


JayhawkCO

Could you not just put a filter (Data->Filter) on the data, filter the one column for blanks, and then put zeroes in the other column?

kphoger

There might be a way to more easily identify the missing data points by using a pivot table, but that's the best I've got.  However, I can't think of a way to get those zeroes back into the original dataset.
Keep right except to pass.  Yes.  You.
Visit scenic Orleans County, NY!
Male pronouns, please.

Quote from: Philip K. DickIf you can control the meaning of words, you can control the people who must use them.

JayhawkCO

#3
Quote from: JayhawkCO on September 08, 2023, 06:45:47 PM
Could you not just put a filter (Data->Filter) on the data, filter the one column for blanks, and then put zeroes in the other column?

Oh, I missed you wanted to add the rows too.

Here's probably what I would do. On a separate sheet, create a column that has every time segment you're looking for. That should be easy enough to create the first couple and then do a fill action down to the bottom. Then, on that sheet's B column, use something like =IF(COUNTIF('Other Sheet'A1:A10000,A1)>0,VLOOKUP(A1,'Other Sheet'A1:B10000,2),0).

jeffandnicole

In case Jayhawk's advice doesn't work...

Are there separation objects used, usually commas or other punctuation?

If so, highlight the column of the data, then in the ribbon, select 'Data', then 'Text to Columns', then Delimited, Next, Select punctuation used or Other and enter separator used, Next, General or Text (whichever works best), Finish.

Scott5114

If you export the sheet to CSV, you might be able to open it in a text editor and use find-and-replace to add some sort of dummy value like 0.00001 (and then set it to round to the nearest whole number when you get it back in Excel).
uncontrollable freak sardine salad chef

kalvado

Ugly but should work:
If you add a column of timestamp (n)-timestamp(n+1) - 1 that would create marker coloumn
Use Excel "if" to add some string marker for non-zero cells, like qwerty; export to .csv and use text editor to replace that marker with new line and something.

webny99

Quote from: JayhawkCO on September 08, 2023, 06:54:56 PM
Quote from: JayhawkCO on September 08, 2023, 06:45:47 PM
Could you not just put a filter (Data->Filter) on the data, filter the one column for blanks, and then put zeroes in the other column?

Oh, I missed you wanted to add the rows too.

Here's probably what I would do. On a separate sheet, create a column that has every time segment you're looking for. That should be easy enough to create the first couple and then do a fill action down to the bottom. Then, on that sheet's B column, use something like =IF(COUNTIF('Other Sheet'A1:A10000,A1)>0,VLOOKUP(A1,'Other Sheet'A1:A10000,2),0).

OK, I think I follow. I'll give this a shot. Seeing as I only have two columns, I'll keep the new data on the same sheet (in columns D and E) for simplicity. One question, what does the "2" before the last "0" represent?

MikeTheActuary

#8
Please note my username.  I get paid to spend WAY too much time doing obscene things with spreadsheets.   :)

Quote from: webny99 on September 08, 2023, 07:28:14 PM
Quote from: JayhawkCO on September 08, 2023, 06:54:56 PM
Quote from: JayhawkCO on September 08, 2023, 06:45:47 PM
Could you not just put a filter (Data->Filter) on the data, filter the one column for blanks, and then put zeroes in the other column?

Oh, I missed you wanted to add the rows too.

Here's probably what I would do. On a separate sheet, create a column that has every time segment you're looking for. That should be easy enough to create the first couple and then do a fill action down to the bottom. Then, on that sheet's B column, use something like =IF(COUNTIF('Other Sheet'A1:A10000,A1)>0,VLOOKUP(A1,'Other Sheet'A1:A10000,2),0).

OK, I think I follow. I'll give this a shot. Seeing as I only have two columns, I'll keep the new data on the same sheet (in columns D and E) for simplicity. One question, what does the "2" before the last "0" represent?

The VLOOKUP section should be VLOOKUP(A1,'Other Sheet'!A1:B10000,2), with the "2" saying pull the value from the second column.   If you use this format, 'Other Sheet'!A1:B10000 will need to be sorted in ascending order.

Rather than use a both COUNTIF and VLOOKUP, I'd do something like this:

=SUMIFS(ColumnOfCounts,ColumnOfTimeStamps,YourTimestamp)

or

=IFERROR(VLOOKUP(YourTimeStamp,OriginalData,ColumnNumber,FALSE),0)

....assuming the first column of OriginalData has the timestamps, and assuming that ColumnNumber represents whichever column has the data you're pulling.

That "FALSE" clause indicates to return a value only if there's an exact match of YourTimeStamp in the first column of OriginalData; otherwise it returns #N/A.   If you nest that inside the IFERROR(), that closing ,0 is saying "give me the result of the main formula...but if the result is an error, give me a zero".

Depending on how big the source data table is, SUMIFS, COUNTIFS, and VLOOKUPS can be a little slow and memory-inefficient.  If you have more than, say, 20-25k rows of data, you might instead consider a new tab laid out as follows:

Column A:  The list of time stamps you want to use, including the missing ones
Column B -- example formula for B2: =MATCH(A2,ColumnofOriginalTimestamps,0)     -- will return where in ColumnOfOriginalTimestamps the match is; if there is no match, it will return #N/A.
Column C -- example formula for C2: =IF(ISNA($B2),0,INDEX(ColumnOfLaneCounts,$B2))    -- if $B2=#N/A, return 0, else returns the B2th value in the ColumnOfLaneCounts

....and if you have multiple columns representing counts in different lanes, just repeat the formula in column C, changing the reference to ColumnOfLaneCounts, as necessary.

INDEX(MATCH()) is considerably more efficient than VLOOKUP or SUMIFS/COUNTIFS.  And if you're trying to pull multiple columns of data for each row, it's most efficient to do the "find which row I want" formula just once.

However, one thing to consider with INDEX(MATCH()), COUNTIFS(), and SUMIFS() -- they don't behave as you'd hope if they're linking to data in a different file.  They only work properly when both files are open.  If you close both files, and then reopen the file with those funtions, you will get errors until you open the source data file.   You can get around this by working only within the source file, or doing a copy-paste-values once you've harvested the data.  In this case, however, I recommend adding the original formula as a comment at the top of the relevant rows before you do the copy paste-values, so that you have something to work with if you need to repeat the exercise later.

Finally...in the most recent versions of Excel, there is a new function, XLOOKUP().  I understand that its efficiency rivals that of my beloved INDEX(MATCH()) trick, and it doesn't have the linked-file issues.  However, this is not something I've had a chance to test out, since my work's IT department hasn't given me a new enough version of Excel on my server.   If you have XLOOKUP, and think it might do the trick....the format is more like that of SUMIFS than VLOOKUP, but it's something you can figure out by looking up the syntax in Microsoft's documentation.

webny99

Quote from: MikeTheActuary on September 08, 2023, 09:45:58 PM
Please note my username.  I get paid to spend WAY too much time doing obscene things with spreadsheets.   :)

Great, thanks for the detailed reply!


Quote from: MikeTheActuary on September 08, 2023, 09:45:58 PM
The VLOOKUP section should be VLOOKUP(A1,'Other Sheet'!A1:B10000,2), with the "2" saying pull the value from the second column.   If you use this format, 'Other Sheet'!A1:B10000 will need to be sorted in ascending order.

OK, thanks. That clarified and I got this formula to work at first, but it stopped working as soon as I hit a cell that needed a 0 filled in...


Quote from: MikeTheActuary on September 08, 2023, 09:45:58 PM
Rather than use a both COUNTIF and VLOOKUP, I'd do something like this:

=SUMIFS(ColumnOfCounts,ColumnOfTimeStamps,YourTimestamp)

Wow, this works too - and I like it better because it's a bit simpler. At first, it was returning #NA when I hit a cell that needed a 0, but then I realized the formula range was moving down with each cell (turns out that was my problem with Jayhawkco's formula, too), so as soon as I was offset by a row, the equivalent cell for the next row was no longer in the range to pull from. I solved that by just moving both datasets down a few hundred cells while starting the formulas ranges from row 1. There's probably an easy fix for that, but this is still a great deal easier than what I was doing before. Thanks again!

jeffandnicole

Quote from: MikeTheActuary on September 08, 2023, 09:45:58 PM
INDEX(MATCH()) is considerably more efficient than VLOOKUP or SUMIFS/COUNTIFS.  And if you're trying to pull multiple columns of data for each row, it's most efficient to do the "find which row I want" formula just once.

This, by the way, is my favorite formula to use. Especially since I'm constantly adding to and modifying my data on a daily basis where data won't be neatly sorted to allow vlookup to work. 

Some of my largest workbook files use approximately 275 columns (252 or so for each business day of the year, plus rollover into the following year), and approximately 60 tabs, with each spreadsheet detailing a different account which all needs to roll into a single summary sheet.  Another large file is about 140 tabs and growing, also representing different accounts.

I had a boss years ago who said I shouldn't make the spreadsheets overly complicated.  I didn't listen.

MikeTheActuary

Quote from: jeffandnicole on September 08, 2023, 11:49:02 PM
I had a boss years ago who said I shouldn't make the spreadsheets overly complicated.  I didn't listen.

Do to some rearranging of functions after an acquisition, I've picked up a new boss, at least until the dust settles and I can train a replacement for some of my duties.

He was not amused when I showed him a 1.3GB .xlsb file I work with.

In fairness, I really should have done that particular task in R or SQL....but I was pressed for time when I built it, and I know Excel.   I'll leave the headache of rebuilding it in something else for my successor.

Scott5114

Quote from: jeffandnicole on September 08, 2023, 11:49:02 PM
Quote from: MikeTheActuary on September 08, 2023, 09:45:58 PM
INDEX(MATCH()) is considerably more efficient than VLOOKUP or SUMIFS/COUNTIFS.  And if you're trying to pull multiple columns of data for each row, it's most efficient to do the "find which row I want" formula just once.

This, by the way, is my favorite formula to use. Especially since I'm constantly adding to and modifying my data on a daily basis where data won't be neatly sorted to allow vlookup to work. 

Some of my largest workbook files use approximately 275 columns (252 or so for each business day of the year, plus rollover into the following year), and approximately 60 tabs, with each spreadsheet detailing a different account which all needs to roll into a single summary sheet.  Another large file is about 140 tabs and growing, also representing different accounts.

I had a boss years ago who said I shouldn't make the spreadsheets overly complicated.  I didn't listen.

I'd say you're well past the point that learning an actual programming language would take less time than...that.
uncontrollable freak sardine salad chef

MikeTheActuary

#13
Quote from: Scott5114 on September 09, 2023, 01:03:40 AM
Quote from: jeffandnicole on September 08, 2023, 11:49:02 PM
Quote from: MikeTheActuary on September 08, 2023, 09:45:58 PM
INDEX(MATCH()) is considerably more efficient than VLOOKUP or SUMIFS/COUNTIFS.  And if you're trying to pull multiple columns of data for each row, it's most efficient to do the "find which row I want" formula just once.

This, by the way, is my favorite formula to use. Especially since I'm constantly adding to and modifying my data on a daily basis where data won't be neatly sorted to allow vlookup to work. 

Some of my largest workbook files use approximately 275 columns (252 or so for each business day of the year, plus rollover into the following year), and approximately 60 tabs, with each spreadsheet detailing a different account which all needs to roll into a single summary sheet.  Another large file is about 140 tabs and growing, also representing different accounts.

I had a boss years ago who said I shouldn't make the spreadsheets overly complicated.  I didn't listen.

I'd say you're well past the point that learning an actual programming language would take less time than...that.

At least in my case, my too-large spreadsheets grow out of one or more of:


  • It started off small, and grew
  • It started off as a one-time task, but became recurrent
  • I have to build something that someone else can review and potentially take over if I get reassigned or hit by a bus...and (in my profession, at least) everybody knows Excel
  • Most auditors are also proficient in Excel; if the auditor assigned to torture me this year isn't proficient in the tools being used, I'll likely have to teach them
  • Corporate IT imposes roadblocks on accessing better tools (e.g. restrictions on installing R or python, blocking CRAN in the corporate nannyguard, turnaround time to get a new SQL database is measured in days....)
  • Rebuilding it in a more suitable platform, testing it, and making changes to downstream files/routines that rely on the spreadsheet's results takes time that I don't have.
...and that's why people like me abuse Excel.

The expansion of the number of columns and rows Excel supports in a single tab was both the best and the worst thing to have happened to me since I started using Excel professionally.

jeffandnicole

Quote from: MikeTheActuary on September 09, 2023, 08:10:35 AM
Quote from: Scott5114 on September 09, 2023, 01:03:40 AM
Quote from: jeffandnicole on September 08, 2023, 11:49:02 PM
Quote from: MikeTheActuary on September 08, 2023, 09:45:58 PM
INDEX(MATCH()) is considerably more efficient than VLOOKUP or SUMIFS/COUNTIFS.  And if you're trying to pull multiple columns of data for each row, it's most efficient to do the "find which row I want" formula just once.

This, by the way, is my favorite formula to use. Especially since I'm constantly adding to and modifying my data on a daily basis where data won't be neatly sorted to allow vlookup to work. 

Some of my largest workbook files use approximately 275 columns (252 or so for each business day of the year, plus rollover into the following year), and approximately 60 tabs, with each spreadsheet detailing a different account which all needs to roll into a single summary sheet.  Another large file is about 140 tabs and growing, also representing different accounts.

I had a boss years ago who said I shouldn't make the spreadsheets overly complicated.  I didn't listen.

I'd say you're well past the point that learning an actual programming language would take less time than...that.

At least in my case, my too-large spreadsheets grow out of one or more of:


  • It started off small, and grew
  • It started off as a one-time task, but became recurrent
  • I have to build something that someone else can review and potentially take over if I get reassigned or hit by a bus...and (in my profession, at least) everybody knows Excel
  • Most auditors are also proficient in Excel; if the auditor assigned to torture me this year isn't proficient in the tools being used, I'll likely have to teach them
  • Corporate IT imposes roadblocks on accessing better tools (e.g. restrictions on installing R or python, blocking CRAN in the corporate nannyguard, turnaround time to get a new SQL database is measured in days....)
  • Rebuilding it in a more suitable platform, testing it, and making changes to downstream files/routines that rely on the spreadsheet's results takes time that I don't have.
...and that's why people like me abuse Excel.

The expansion of the number of columns and rows Excel supports in a single tab was both the best and the worst thing to have happened to me since I started using Excel professionally.

#1: Congrats on the 1.3 GB file. That certainly blows me away. I am honestly impressed you created something that large.

#2: My answer to Scott is similar to your answer, with the most important issue being I need to send my file to others to use in my Division. I know Excel. I don't know Access. Others know Excel. Others don't know Access. We had one person that knew Access well and he left. I did bring up to management the issue with only me knowing the depths of the workbooks, but for now they don't seem to have any interest in buying another program, as what I created works.

vdeane

Quote from: webny99 on September 08, 2023, 06:40:57 PM
I think I recall that a number of users here are familiar with Excel, so I figured I'd see if anyone has any solutions to a problem I'm having.

In trying to compile and analyze Thruway AADT data, I've discovered that all zero data points do not exist in the data. In other words, if 0 vehicles were recorded in a given lane in a given hour, that hour is entirely skipped from the data, as if it did not exist. This is a problem because in order to interpret the data, I've been using the =WRAPCOLS function to organize it in columns of 24 (hours per day) to obtain daily volumes, and this obviously does not work properly if there are missing data points. Even one missing point throws it off, and searching for such among a year's worth of data (8064 data points) is like searching for a needle in a haystack, but even more tedious.

So, I have two columns of data, Column 1 has the date and hour in [MM/DD/YYYY XX:00] format, and Column 2 has the traffic count value for that hour. Is there any function I can use to add rows for all of the missing [MM/DD/YYYY XX:00] values in column 1 and autofill those rows with zeroes in column 2?

Google has been no help, and a solution could save me upwards of an hour x ~15 remaining datasets, so any advice is much appreciated. Thanks!
Non-Excel response, but if the Thruway processes AADT data the same way NYSDOT does, a blank line might just mean "no data" and not 0.  NYSDOT will delete hours containing incomplete (defined as not having all four 15-minute increments recorded) or outlier data (anything that's out of line of what the same hour is for other days recorded, especially if on Mon-Thurs or Friday morning, the days used for AADT calculation) before outputting a report and calculating AADT.
Please note: All comments here represent my own personal opinion and do not reflect the official position of NYSDOT or its affiliates.

webny99

Quote from: vdeane on September 09, 2023, 04:40:53 PM
Quote from: webny99 on September 08, 2023, 06:40:57 PM
I think I recall that a number of users here are familiar with Excel, so I figured I'd see if anyone has any solutions to a problem I'm having.

In trying to compile and analyze Thruway AADT data, I've discovered that all zero data points do not exist in the data. In other words, if 0 vehicles were recorded in a given lane in a given hour, that hour is entirely skipped from the data, as if it did not exist. This is a problem because in order to interpret the data, I've been using the =WRAPCOLS function to organize it in columns of 24 (hours per day) to obtain daily volumes, and this obviously does not work properly if there are missing data points. Even one missing point throws it off, and searching for such among a year's worth of data (8064 data points) is like searching for a needle in a haystack, but even more tedious.

So, I have two columns of data, Column 1 has the date and hour in [MM/DD/YYYY XX:00] format, and Column 2 has the traffic count value for that hour. Is there any function I can use to add rows for all of the missing [MM/DD/YYYY XX:00] values in column 1 and autofill those rows with zeroes in column 2?

Google has been no help, and a solution could save me upwards of an hour x ~15 remaining datasets, so any advice is much appreciated. Thanks!
Non-Excel response, but if the Thruway processes AADT data the same way NYSDOT does, a blank line might just mean "no data" and not 0.  NYSDOT will delete hours containing incomplete (defined as not having all four 15-minute increments recorded) or outlier data (anything that's out of line of what the same hour is for other days recorded, especially if on Mon-Thurs or Friday morning, the days used for AADT calculation) before outputting a report and calculating AADT.

That is certainly a possibility, but so far, the omitted data seems to track more closely with true zero values. It's also broken down by lane, which increases the possibility of zero values occurring: for example, I've encountered missing values in the left lane in the middle of the night hours that would be explained by very light traffic, and missing values in the right lane couple with abnormally high values in the left lane that would be explained by a lane closure.

My understanding is that the data is from the toll gantries themselves, so deleting it isn't as viable as it might be for NYSDOT seeing as it's also used to calculate and bill tolls.

vdeane

Quote from: webny99 on September 09, 2023, 05:27:30 PM
That is certainly a possibility, but so far, the omitted data seems to track more closely with true zero values. It's also broken down by lane, which increases the possibility of zero values occurring: for example, I've encountered missing values in the left lane in the middle of the night hours that would be explained by very light traffic, and missing values in the right lane couple with abnormally high values in the left lane that would be explained by a lane closure.

My understanding is that the data is from the toll gantries themselves, so deleting it isn't as viable as it might be for NYSDOT seeing as it's also used to calculate and bill tolls.
I would't be surprised if the data is more separate than you'd suspect, even if it ultimately comes from the same source.  Different divisions of even the same government agency can be quite siloed.  Aside from vehicle classification data, I imagine the billing department and the data department are looking at different things.  In fact, the data department might not even be allowed to see the stuff the billing department does due to confidentiality policies.  That said, one would hope that any 0s in data are more reliably actual 0s for toll gantry data than for tube count data.

Incidentally, it's NYSDOT policy to get vehicle classification data by lane (volume data only needs to be by direction), so I know there's a way for Traffic Count Editor to deal with that, even if I'm not sure exactly how (I've only done volume count processing myself; Main Office processes the class counts internally).
Please note: All comments here represent my own personal opinion and do not reflect the official position of NYSDOT or its affiliates.

Rothman



Quote from: vdeane on September 09, 2023, 09:33:46 PM
Quote from: webny99 on September 09, 2023, 05:27:30 PM
That is certainly a possibility, but so far, the omitted data seems to track more closely with true zero values. It's also broken down by lane, which increases the possibility of zero values occurring: for example, I've encountered missing values in the left lane in the middle of the night hours that would be explained by very light traffic, and missing values in the right lane couple with abnormally high values in the left lane that would be explained by a lane closure.

My understanding is that the data is from the toll gantries themselves, so deleting it isn't as viable as it might be for NYSDOT seeing as it's also used to calculate and bill tolls.
I would't be surprised if the data is more separate than you'd suspect, even if it ultimately comes from the same source.  Different divisions of even the same government agency can be quite siloed.

^This.  Thruway's divisions are even more their own fiefdoms than NYSDOT Regions.  At least, when I interviewed with them, it certainly came across in spades.

Please note: All comments here represent my own personal opinion and do not reflect the official position(s) of NYSDOT.

webny99

Quote from: vdeane on September 09, 2023, 09:33:46 PM
I would't be surprised if the data is more separate than you'd suspect, even if it ultimately comes from the same source.  Different divisions of even the same government agency can be quite siloed.  Aside from vehicle classification data, I imagine the billing department and the data department are looking at different things.  In fact, the data department might not even be allowed to see the stuff the billing department does due to confidentiality policies.  That said, one would hope that any 0s in data are more reliably actual 0s for toll gantry data than for tube count data.

That would make sense, especially considering it was over a month from my initial request to when I received the count data. (Despite the missing data issue, I've been satisfied with the data quality overall. I'm just finishing up the last few gantries and will share more on my findings in the Thruway thread shortly.)

kphoger

Quote from: MikeTheActuary on September 08, 2023, 09:45:58 PM
The VLOOKUP section should be VLOOKUP(A1,'Other Sheet'!A1:B10000,2), with the "2" saying pull the value from the second column.

For the sake of clarity, in case it comes up, the "2" says pull the value from the next column to the right of the target cell (here represented as A1).  That is to say, =VLOOKUP(M2,BlahBlah!A2:T9999,2,FALSE) will return the value from Cell B2 of BlahBlah. But =VLOOKUP(M2,BlahBlah!G2:T9999,2,FALSE) will return the value from Cell H2 of BlahBlah–not the value from Cell B2.

Quote from: webny99 on September 08, 2023, 10:45:40 PM
At first, it was returning #NA when I hit a cell that needed a 0, but then I realized the formula range was moving down with each cell (turns out that was my problem with Jayhawkco's formula, too), so as soon as I was offset by a row, the equivalent cell for the next row was no longer in the range to pull from. I solved that by just moving both datasets down a few hundred cells while starting the formulas ranges from row 1. There's probably an easy fix for that, but this is still a great deal easier than what I was doing before. Thanks again!

If you don't want the "pull from" cell references to change as you fill-down, then use the almighty dollar sign.  C2 will change to C3 and then C4 as you fill-down.  But $C$2 will stay the same as you fill-down.  Basically, $C$2 means "I mean keep it as C2, no matter how I fill-down or fill-right).

For example, as you fill-down:

=VLOOKUP(M2,BlahBlah!A2:T5000,4,FALSE) ... will become ...
=VLOOKUP(M3,BlahBlah!A3:T5001,4,FALSE) ... on Row 2, and then ...
=VLOOKUP(M4,BlahBlah!A4:T5002,4,FALSE) ... on Row 3.

However, as you fill-down:

=VLOOKUP(M2,BlahBlah!$A$2:$T$5000,4,FALSE) ... will become ...
=VLOOKUP(M3,BlahBlah!$A$2:$T$5000,4,FALSE) ... on Row 2, and then ...
=VLOOKUP(M4,BlahBlah!$A$2:$T$5000,4,FALSE) ... on Row 3.
Keep right except to pass.  Yes.  You.
Visit scenic Orleans County, NY!
Male pronouns, please.

Quote from: Philip K. DickIf you can control the meaning of words, you can control the people who must use them.

MikeTheActuary

Quote from: jeffandnicole on September 09, 2023, 09:37:02 AM#1: Congrats on the 1.3 GB file. That certainly blows me away. I am honestly impressed you created something that large.

With a terrabyte of RAM and 36 cores, many abuses of Excel are possible.

webny99

Quote from: kphoger on September 10, 2023, 03:30:41 PM
Quote from: MikeTheActuary on September 08, 2023, 09:45:58 PM
The VLOOKUP section should be VLOOKUP(A1,'Other Sheet'!A1:B10000,2), with the "2" saying pull the value from the second column.

For the sake of clarity, in case it comes up, the "2" says pull the value from the next column to the right of the target cell (here represented as A1).  That is to say, =VLOOKUP(M2,BlahBlah!A2:T9999,2,FALSE) will return the value from Cell B2 of BlahBlah. But =VLOOKUP(M2,BlahBlah!G2:T9999,2,FALSE) will return the value from Cell H2 of BlahBlah–not the value from Cell B2.

Quote from: webny99 on September 08, 2023, 10:45:40 PM
At first, it was returning #NA when I hit a cell that needed a 0, but then I realized the formula range was moving down with each cell (turns out that was my problem with Jayhawkco's formula, too), so as soon as I was offset by a row, the equivalent cell for the next row was no longer in the range to pull from. I solved that by just moving both datasets down a few hundred cells while starting the formulas ranges from row 1. There's probably an easy fix for that, but this is still a great deal easier than what I was doing before. Thanks again!

If you don't want the "pull from" cell references to change as you fill-down, then use the almighty dollar sign.  C2 will change to C3 and then C4 as you fill-down.  But $C$2 will stay the same as you fill-down.  Basically, $C$2 means "I mean keep it as C2, no matter how I fill-down or fill-right).

Thanks for the clarification on both counts.  :thumbsup: (I should've known that about the dollar signs, but if I did, I'd forgotten.)




Anyways, thanks to everyone that replied. I'm glad this led to a solution and some interesting discussion. I've finished up what I was working on with the Thruway data and more details on that can be found here.



Opinions expressed here on belong solely to the poster and do not represent or reflect the opinions or beliefs of AARoads, its creators and/or associates.