News:

Check out the AARoads Wiki!

Main Menu

Microsoft Excel

Started by kphoger, July 15, 2020, 03:44:24 PM

Previous topic - Next topic

zachary_amaryllis

Quote from: Scott5114 on April 15, 2022, 09:59:28 PM
Y'all are gonna be mad if you ever hear about Linux's middle-click paste...

omg how cool is that, right?

plus copy/paste works to other devices, even. i use kde connect for my phone, so if i copy something on the computer, it's on the phone ready to be pasted, and vice versa.
clinched:
I-64, I-80, I-76 (west), *64s in hampton roads, 225,270,180 (co, wy)


zachary_amaryllis

Quote from: Scott5114 on April 16, 2022, 06:34:55 PM
I don't have access to Excel, but I do have vi and emacs, so...

some serious nerd cred, right there. respect.

i'm a nano man, myself.
clinched:
I-64, I-80, I-76 (west), *64s in hampton roads, 225,270,180 (co, wy)

Scott5114

Quote from: zachary_amaryllis on April 17, 2022, 01:49:01 AM
Quote from: Scott5114 on April 16, 2022, 06:34:55 PM
I don't have access to Excel, but I do have vi and emacs, so...

some serious nerd cred, right there. respect.

i'm a nano man, myself.

nano is pretty cool, and probably better than vi for most people, but vi is the one I learned first and is usually more likely to be preinstalled. My actual daily driver for plain-text editing is kate. emacs is a break-glass-in-case-of-emergency sort of thing; I have it installed but mostly to open it up once every few months to remind myself that I don't like pressing Ctrl that much.

Haven't heard of KDE Connect, so I need to look into that . . .
uncontrollable freak sardine salad chef

Dirt Roads

Quote from: kphoger on April 16, 2022, 07:07:50 PM
Ick, large numbers are something I deal with every day at work.  Unfortunately, I get some of the data sent to me as an Excel file via e-mail with the numbers already having been converted to exponentials–such that changing them back into text rounds them to the nearest million or whatever.  Which isn't exactly useful, considering that those numbers are serial numbers and the last several digits kind of matter.

Quote from: kalvado on April 16, 2022, 09:46:38 PM
There is a book, "humble pi" - about mathematical and near-math issues all over the place. Excel is a subject of a separate chapter.
Phone numbers distorted is a simple one. Issue I really like, though, is more involved:
QuoteIn 2016 three intrepid researchers in Melbourne analysed eighteen journals which had published genome research between 2005 and 2015 and found a total of 35,175 publicly available Excel files associated with 3,597 different research papers...... the researchers were left with 987 Excel spreadsheets which had gene name errors introduced by Excel. In their sample, they found that 19.6 per cent of gene research crunched in Excel contained errors.
Examples are SEP15 and MARCH5 genes which Excel converts into dates.

Quote from: Scott5114 on April 16, 2022, 09:57:26 PM
You see things like that because there is no easy-to-use, well-understood database software that is as widely available as Excel is. So when people have a data grid, they don't know how to make the determination between when to use a spreadsheet and when to use a database...so a spreadsheet program gets pressed into use because it's "good enough".

Quote from: kalvado on April 16, 2022, 10:02:09 PMIt is more than that. Complex Excel spreadsheets are essentially programs made by non-programmers, with no testing, often no planning.
From the same book
QuoteThe European Spreadsheet Risks Interest Group (yes, that is a real organization, one dedicated to examining the moments when spreadsheets go wrong) estimates that over 90 per cent of all spreadsheets contain errors. Around 24 per cent of spreadsheets that use formulas contain a direct maths error in their computations.

You are all venturing into my world.  Amazingly, I have caught Microsoft Excel mistakes on complex system simulations on several different occasions.  When we analyzed the mistakes, there was nothing wrong with the algorithms.  But either Microsoft or Windows itself was incorrectly computing the cell contents.  The issue was reproducible several times, but the error would go away after the computer was rebooted.  If you are wondering whether the computer itself could have generated these errors, indeed it could.  But Microsoft's error detection/error correction is so efficient that it is unlikely to fail once much less on back-to-back occasions.  (The old computer that I'm on right now probably has a million bit errors per second and hardly any of them cause any problems).  But Microsoft is quite sloppy by not detecting whether its own Excel code encounters a glitch that generates an error.

Anyhow, this all creates a nightmare for guys like me who serve as the line of defense in safety certification.  (Fortunately, our simulations were not for safety purposes even though they could be used for such, because that was always outside of our scope of work).

Dirt Roads

Quote from: Scott5114 on April 16, 2022, 11:30:44 PM
But if I went to your house and started smacking your air conditioner around with a hammer you'd kick me off your property.

Hammer time.  One of my former lives on the railroad included repair work on 1950s electronic optic sensor systems interfaced with 1970s microprocessor-based data analytical systems (plus radio interfaces).  Some of the interface boards were handmade.  I would get an emergency call to fix a "computer problem" in the field and I would show up with rubber hammer and an old blow dryer with the heating element turned off.  Invariably, they thought I was joking.  The fellow I replaced always soldered-out a new capacitor (or two) on the homemade circuit boards, which I was certainly capable of doing.  I fixed 100% of the problems my way, no soldering involved.  Believe it or not, I learned the hammer and blower technique in a college course.

Rothman

Quote from: Dirt Roads on April 17, 2022, 06:01:10 PM
Quote from: kphoger on April 16, 2022, 07:07:50 PM
Ick, large numbers are something I deal with every day at work.  Unfortunately, I get some of the data sent to me as an Excel file via e-mail with the numbers already having been converted to exponentials–such that changing them back into text rounds them to the nearest million or whatever.  Which isn't exactly useful, considering that those numbers are serial numbers and the last several digits kind of matter.

Quote from: kalvado on April 16, 2022, 09:46:38 PM
There is a book, "humble pi" - about mathematical and near-math issues all over the place. Excel is a subject of a separate chapter.
Phone numbers distorted is a simple one. Issue I really like, though, is more involved:
QuoteIn 2016 three intrepid researchers in Melbourne analysed eighteen journals which had published genome research between 2005 and 2015 and found a total of 35,175 publicly available Excel files associated with 3,597 different research papers...... the researchers were left with 987 Excel spreadsheets which had gene name errors introduced by Excel. In their sample, they found that 19.6 per cent of gene research crunched in Excel contained errors.
Examples are SEP15 and MARCH5 genes which Excel converts into dates.
Quote from: Scott5114 on April 16, 2022, 09:57:26 PM
You see things like that because there is no easy-to-use, well-understood database software that is as widely available as Excel is. So when people have a data grid, they don't know how to make the determination between when to use a spreadsheet and when to use a database...so a spreadsheet program gets pressed into use because it's "good enough".
Quote from: kalvado on April 16, 2022, 10:02:09 PMIt is more than that. Complex Excel spreadsheets are essentially programs made by non-programmers, with no testing, often no planning.
From the same book
QuoteThe European Spreadsheet Risks Interest Group (yes, that is a real organization, one dedicated to examining the moments when spreadsheets go wrong) estimates that over 90 per cent of all spreadsheets contain errors. Around 24 per cent of spreadsheets that use formulas contain a direct maths error in their computations.

You are all venturing into my world.  Amazingly, I have caught Microsoft Excel mistakes on complex system simulations on several different occasions.  When we analyzed the mistakes, there was nothing wrong with the algorithms.  But either Microsoft or Windows itself was incorrectly computing the cell contents.  The issue was reproducible several times, but the error would go away after the computer was rebooted.  If you are wondering whether the computer itself could have generated these errors, indeed it could.  But Microsoft's error detection/error correction is so efficient that it is unlikely to fail once much less on back-to-back occasions.  (The old computer that I'm on right now probably has a million bit errors per second and hardly any of them cause any problems).  But Microsoft is quite sloppy by not detecting whether its own Excel code encounters a glitch that generates an error.

Anyhow, this all creates a nightmare for guys like me who serve as the line of defense in safety certification.  (Fortunately, our simulations were not for safety purposes even though they could be used for such, because that was always outside of our scope of work).
That's really disturbing.  If it was fixed upon reboot, makes me wonder about the calculation engine completing properly.
Please note: All comments here represent my own personal opinion and do not reflect the official position(s) of NYSDOT.

kkt

Quote from: Scott5114 on April 16, 2022, 06:34:55 PM
I don't have access to Excel, but I do have vi and emacs, so...

  :clap:

abefroman329

Hey fellas, if I shouldn't be using Excel to edit CSVs, what should I be using?

Rothman

Quote from: abefroman329 on April 18, 2022, 08:29:58 AM
Hey fellas, if I shouldn't be using Excel to edit CSVs, what should I be using?
I have never had an issue importing .csvs into Excel.
Please note: All comments here represent my own personal opinion and do not reflect the official position(s) of NYSDOT.

kalvado

Quote from: Rothman on April 17, 2022, 11:23:33 PM
Quote from: Dirt Roads on April 17, 2022, 06:01:10 PM
Quote from: kphoger on April 16, 2022, 07:07:50 PM
Ick, large numbers are something I deal with every day at work.  Unfortunately, I get some of the data sent to me as an Excel file via e-mail with the numbers already having been converted to exponentials–such that changing them back into text rounds them to the nearest million or whatever.  Which isn't exactly useful, considering that those numbers are serial numbers and the last several digits kind of matter.

Quote from: kalvado on April 16, 2022, 09:46:38 PM
There is a book, "humble pi" - about mathematical and near-math issues all over the place. Excel is a subject of a separate chapter.
Phone numbers distorted is a simple one. Issue I really like, though, is more involved:
QuoteIn 2016 three intrepid researchers in Melbourne analysed eighteen journals which had published genome research between 2005 and 2015 and found a total of 35,175 publicly available Excel files associated with 3,597 different research papers...... the researchers were left with 987 Excel spreadsheets which had gene name errors introduced by Excel. In their sample, they found that 19.6 per cent of gene research crunched in Excel contained errors.
Examples are SEP15 and MARCH5 genes which Excel converts into dates.
Quote from: Scott5114 on April 16, 2022, 09:57:26 PM
You see things like that because there is no easy-to-use, well-understood database software that is as widely available as Excel is. So when people have a data grid, they don't know how to make the determination between when to use a spreadsheet and when to use a database...so a spreadsheet program gets pressed into use because it's "good enough".
Quote from: kalvado on April 16, 2022, 10:02:09 PMIt is more than that. Complex Excel spreadsheets are essentially programs made by non-programmers, with no testing, often no planning.
From the same book
QuoteThe European Spreadsheet Risks Interest Group (yes, that is a real organization, one dedicated to examining the moments when spreadsheets go wrong) estimates that over 90 per cent of all spreadsheets contain errors. Around 24 per cent of spreadsheets that use formulas contain a direct maths error in their computations.

You are all venturing into my world.  Amazingly, I have caught Microsoft Excel mistakes on complex system simulations on several different occasions.  When we analyzed the mistakes, there was nothing wrong with the algorithms.  But either Microsoft or Windows itself was incorrectly computing the cell contents.  The issue was reproducible several times, but the error would go away after the computer was rebooted.  If you are wondering whether the computer itself could have generated these errors, indeed it could.  But Microsoft's error detection/error correction is so efficient that it is unlikely to fail once much less on back-to-back occasions.  (The old computer that I'm on right now probably has a million bit errors per second and hardly any of them cause any problems).  But Microsoft is quite sloppy by not detecting whether its own Excel code encounters a glitch that generates an error.

Anyhow, this all creates a nightmare for guys like me who serve as the line of defense in safety certification.  (Fortunately, our simulations were not for safety purposes even though they could be used for such, because that was always outside of our scope of work).
That's really disturbing.  If it was fixed upon reboot, makes me wonder about the calculation engine completing properly.
Millions of bit errors is a joke. If I remember correctly, current error rate is 1 per gigabyte of RAM per day.
However, complex systems are prone to bugs. Often one of a kind, hard to reproduce and so on. My gut feeling is that MS is not even trying to catch such things anymore.

kphoger

Quote from: Dirt Roads on April 17, 2022, 06:11:54 PM

Quote from: Scott5114 on April 16, 2022, 11:30:44 PM
But if I went to your house and started smacking your air conditioner around with a hammer you'd kick me off your property.

Hammer time.  One of my former lives on the railroad included repair work on 1950s electronic optic sensor systems interfaced with 1970s microprocessor-based data analytical systems (plus radio interfaces).  Some of the interface boards were handmade.  I would get an emergency call to fix a "computer problem" in the field and I would show up with rubber hammer and an old blow dryer with the heating element turned off.  Invariably, they thought I was joking.  The fellow I replaced always soldered-out a new capacitor (or two) on the homemade circuit boards, which I was certainly capable of doing.  I fixed 100% of the problems my way, no soldering involved.  Believe it or not, I learned the hammer and blower technique in a college course.

Network interface devices/units for analog telephony have been know to work better after taking them off the side of the house and kicking them around the yard for a few minutes...

He Is Already Here! Let's Go, Flamingo!
Dost thou understand the graveness of the circumstances?
Deut 23:13
Male pronouns, please.

Quote from: PKDIf you can control the meaning of words, you can control the people who must use them.

Scott5114

Quote from: abefroman329 on April 18, 2022, 08:29:58 AM
Hey fellas, if I shouldn't be using Excel to edit CSVs, what should I be using?

A CSV file is, at its base level, a text file that has each record on its own line and field values separated by commas. If a field has a comma, space, or line break in it, it must be surrounded by quotes that tell the parser to ignore anything between the quotes.

The following is a valid CSV.

"People posting on this page of this thread","Total post count"
zachary_amaryllis,719
zachary_amaryllis,719
Scott5114,14590
"Dirt Roads",1461
"Dirt Roads",1461
Rothman,9594
kkt,6107
abefroman329,3781
Rothman,9594
kalvado,4567
kphoger,21016


Note the quotes that are required around "Dirt Roads".

And indeed, if we open it in a spreadsheet program (LibreOffice in this case), we see that it functions as expected:



However, you can see the spreadsheet has right-aligned the post numbers, indicating that it recognizes them as numbers and is applying autoformatting to them. That autoformatting is what makes editing CSVs in spreadsheets so insidious, because it can result in data loss if the type of data being stored is a "string of numbers" like a serial number, as opposed to a number used to represent value.

So if you want to be 100% certain that you won't lose data, you need to open it in a plain text editor. (At least to sanity-check the CSV output after it's been edited in a spreadsheet.) Unfortunately, the plain text editor Windows ships with is Notepad, which is borderline useless for any sort of in-depth editing. As mentioned above, I like the Kate editor, which is available for Windows, but there are plenty of others.
uncontrollable freak sardine salad chef

kalvado

Quote from: Scott5114 on April 18, 2022, 03:58:53 PM
Quote from: abefroman329 on April 18, 2022, 08:29:58 AM
Hey fellas, if I shouldn't be using Excel to edit CSVs, what should I be using?

A CSV file is, at its base level, a text file that has each record on its own line and field values separated by commas. If a field has a comma, space, or line break in it, it must be surrounded by quotes that tell the parser to ignore anything between the quotes.

The following is a valid CSV.

"People posting on this page of this thread","Total post count"
zachary_amaryllis,719
zachary_amaryllis,719
Scott5114,14590
"Dirt Roads",1461
"Dirt Roads",1461
Rothman,9594
kkt,6107
abefroman329,3781
Rothman,9594
kalvado,4567
kphoger,21016


Note the quotes that are required around "Dirt Roads".

And indeed, if we open it in a spreadsheet program (LibreOffice in this case), we see that it functions as expected:



However, you can see the spreadsheet has right-aligned the post numbers, indicating that it recognizes them as numbers and is applying autoformatting to them. That autoformatting is what makes editing CSVs in spreadsheets so insidious, because it can result in data loss if the type of data being stored is a "string of numbers" like a serial number, as opposed to a number used to represent value.

So if you want to be 100% certain that you won't lose data, you need to open it in a plain text editor. (At least to sanity-check the CSV output after it's been edited in a spreadsheet.) Unfortunately, the plain text editor Windows ships with is Notepad, which is borderline useless for any sort of in-depth editing. As mentioned above, I like the Kate editor, which is available for Windows, but there are plenty of others.
And of course there is a way to avoid that specific problem:

It doesn't mean you cannot face a different issue, though.
I would say that CSV import, if files of similar format are imported once,  needs to be worked out and possibly automated. At a base level, Excel is just too universal to fit specific purposes.  (replace CSV import and Excel with your favorite task as you like)
If your task and file content is very different every time, though... Any software may give users hard time with that.

hotdogPi

Putting the serial number in quotes in the CSV makes it interpreted as text, right?
Clinched

Traveled, plus
US 13, 50
MA 10,22,35,40,53,63,79,107,109,126,138,141,151,159
NH 27,78,111A; CA 90; NY 9A,40,366,423; RI 117; CT 32,193,320; VT 2A,5A; PA 3,51,60, GA 42,140; FL A1A,7; WA 202; QC 162,165,263; 🇬🇧A100,A3211,A3213,A3215; 🇫🇷95 D316

New traveled: I-80, I-287, I-684, CT 72, GSP

Scott5114

#89
Quote from: 1 on April 18, 2022, 05:29:53 PM
Putting the serial number in quotes in the CSV makes it interpreted as text, right?

Quotes merely tell a CSV parser that all values from there to the next quote are to be treated as one field. Otherwise a value like "Minneapolis, Minnesota" would incorrectly induce a field break. (This leads to the question of "what if a value includes a quote?" which is answered by treating two consecutive quotes as a literal quote character.) The CSV format makes no distinction between data types stored in its fields; it doesn't know if a string of characters is a literal string or a number. That determination is made by whatever tool is used to open and parse the data, which is why a text editor will not introduce errors in the file; it makes no attempt to determine data types and therefore the display will always reflect the actual data stored in the file.

Accordingly, then, the string test,"02134" in LibreOffice using the default CSV-import settings is displayed as 2134, indicating that it has been parsed as a number.

To be fair to spreadsheets, the CSV format is notoriously messy and difficult to parse. (If you wanted to represent two consecutive literal quotes in a field, you would have to encode it as """""".) Eric S. Raymond specifically called out the failures of the CSV format in his book The Art of Unix Programming (which is an excellent read for anyone interested in computers, whether or not they are programmers or Unix users, as it is more a chronicle of good software design practices) in a thorough enough way that I won't bother rehashing them here. But CSVs are easy to generate and, unlike more robust text-file formats like XML, don't bake any particulars about the data (such as field names or data types) into the structure of the file. That means that once you have gone through the ordeal of writing a CSV parser, you can reuse it for many different types of data. That flexibility is why the CSV format has stuck around, despite its flaws.
uncontrollable freak sardine salad chef

JayhawkCO

Doesn't seem like anyone uses TSVs anymore. It solves issues like your Minneapolis, Minnesota example.

Scott5114

#91
Quote from: JayhawkCO on April 18, 2022, 06:06:20 PM
Doesn't seem like anyone uses TSVs anymore. It solves issues like your Minneapolis, Minnesota example.

Tabs have the downside of being difficult to distinguish from runs of spaces when viewed by a human in a text editor. Especially when a field is long enough that it is only 1 or 2 characters away from the next tab stop. Some editors compensate for this by displaying tabs differently (Kate adds a small chevron graphic to the beginning of tabs) but it's still frustrating.

On Unix, you usually see colon-delimited fields, with literal colons represented by \:, literal newlines represented by \n, and literal backslashes represented by \\. (This could theoretically lead to backslash explosion much like the quote example given above, but it's much cleaner since there will always be an even number of backslashes, and backslashes are far less likely to occur in most data than quotes are.) This is so simple to parse I'm pretty confident that I, a rank amateur, could probably have a working parser going in 30 minutes.
uncontrollable freak sardine salad chef

kphoger

Quote from: 1 on April 18, 2022, 05:29:53 PM
Putting the serial number in quotes in the CSV makes it interpreted as text, right?

It helps to convert your cells to text format before pasting-as-values into them.  But that doesn't help when the data I get from outside the company has already been number-format-ruined.

He Is Already Here! Let's Go, Flamingo!
Dost thou understand the graveness of the circumstances?
Deut 23:13
Male pronouns, please.

Quote from: PKDIf you can control the meaning of words, you can control the people who must use them.

SectorZ

So the whole number to date thing Excel forces is fun.

I get that if I put in "9-21" and it wants to change it to "21-Sep", OK, I get it.

When I put in "9-41" and it changes it to "41-Sep", I get a little pissed off.

hotdogPi

#94
Quote from: SectorZ on April 18, 2022, 08:33:09 PM
So the whole number to date thing Excel forces is fun.

I get that if I put in "9-21" and it wants to change it to "21-Sep", OK, I get it.

When I put in "9-41" and it changes it to "41-Sep", I get a little pissed off.

In Numbers, the Mac equivalent of Excel:
9-20 → September 20, current year
9-40 → September 1, 2040
9/20 → September 20, current year
9/40 → 0.225

1-49 → January 1, 2049
1-50 → January 1, 1950
Clinched

Traveled, plus
US 13, 50
MA 10,22,35,40,53,63,79,107,109,126,138,141,151,159
NH 27,78,111A; CA 90; NY 9A,40,366,423; RI 117; CT 32,193,320; VT 2A,5A; PA 3,51,60, GA 42,140; FL A1A,7; WA 202; QC 162,165,263; 🇬🇧A100,A3211,A3213,A3215; 🇫🇷95 D316

New traveled: I-80, I-287, I-684, CT 72, GSP

kalvado

Quote from: SectorZ on April 18, 2022, 08:33:09 PM
So the whole number to date thing Excel forces is fun.

I get that if I put in "9-21" and it wants to change it to "21-Sep", OK, I get it.

When I put in "9-41" and it changes it to "41-Sep", I get a little pissed off.
Did you ever try to microwave something for 7 min 77 sec ( 7-7-7-start)?

kurumi

Joke making the rounds: What do Excel and incels have in common? They often think it's a date when it really isn't.
My first SF/horror short story collection is available: "Young Man, Open Your Winter Eye"

BlueSky: https://bsky.app/profile/therealkurumi.bsky.social

abefroman329

Quote from: kurumi on April 19, 2022, 12:22:50 PM
Joke making the rounds: What do Excel and incels have in common? They often think it's a date when it really isn't.
:-D

SectorZ

Quote from: kurumi on April 19, 2022, 12:22:50 PM
Joke making the rounds: What do Excel and incels have in common? They often think it's a date when it really isn't.

This is the first time I've heard an incel joke that was actually funny.  :D

kkt

Quote from: kalvado on April 18, 2022, 09:07:05 PM
Quote from: SectorZ on April 18, 2022, 08:33:09 PM
So the whole number to date thing Excel forces is fun.

I get that if I put in "9-21" and it wants to change it to "21-Sep", OK, I get it.

When I put in "9-41" and it changes it to "41-Sep", I get a little pissed off.
Did you ever try to microwave something for 7 min 77 sec ( 7-7-7-start)?

I am often microwaving things for 90 seconds:

9 0 start

instead of

1 3 0 start

Quote from: kurumi on April 19, 2022, 12:22:50 PM
Joke making the rounds: What do Excel and incels have in common? They often think it's a date when it really isn't.

:-D

Yeah, Excel is a great quick and dirty tool for getting a general picture of data or a one-shot calculation.  But if you're using the data a lot or if it's really critical it's not a substitute for a database which will barf if you try to add numbers and words.