Monday, June 7, 2010

Methods of data organization

This summer I've got a lot of data to analyze. Do any of my readers have suggestions for how to organize your files and analyses so that you can keep track of what has been done? It's far too easy to modify a bunch of data, do some analyses, then come back a week later and not remember what was done or why. How do you do it, o experienced readers?

When I took a GIS class, they taught us an organization system where you always always always make a folder for your original data within your project folder and then never touch it. Then you have another folder for the stuff you're modifying. Most importantly, you keep a text file where you explain step by step everything that you are doing. This system seems like a pretty good starting point, but I'm looking for other suggestions. Mostly I just don't want to end up with something like this:
Thank you, Ph.D. Comics, for being so topical.

11 comments:

jaxzwolf said...

The tier system has always worked really well for me. I always keep a folder with the original data, then a second folder for analyses, and within that folder other folders for each type. Er... hard to explain without a picture of some sort. I'll try: Folder for measurement A = FMA, Folder for measurement B = FMB, etc.

Main folder with original data
/ | \
FMA FMB FMC
| | |
stats stats stats


Each statistical test has its own file name, for example, MeasurementA_ANOVA_Thing1vsThing2, and I usually include my step-by-step explanation of procedures as part of the output file, if at all possible (depends on the statistical software in use whether or not you can add text to the output).

Um, does that make any sense at all?

Karina said...

Thanks jaxzwolf! That's wonderful. Thank you for sharing your organization system!

Anonymous said...

As for your field assistant, I work in a east African country and when you fire someone you have to give severance pay. I don;t know if the country you're in works the same (or is the same...), but that is a possible solution if you have to fire this guy. And if he doesn't reliably show up, I'd fire him sooner, rather than later (better for havnig to train a replacement).

Karina said...

Thanks Anonymous. I'm not sure if severance pay is typical but I wouldn't be surprised if it is. I really hope I don't have to fire him as I do really enjoying working with him when he's there. We'll see what happens...

Anonymous said...

same anon again:

One possible solution is to talk to your assistant about his inability to get back on time, citing the problem and asking if a different off schedule would be better for him. Point out that his data collection is good, but reliably showing up is important too, and matter for raises, bonuses. We have to give 4 days (plus public holidays) to ours, but often send them home for 2-3 days at a time (given transportation here, working 10 days straight and going home 2-3, repeat seems to work better than just giving them Sundays off). If he does a good job and you get along well, it'd be a pity to loose him over this.

and as someone who works with "critters: of my own in an East African country, I'm curious which ones you work in and on (some of what you say is very similar to my experience).

Karina said...

I do need to have another conversation with him, but for the time being I can't do anything really, because I have only been told by third parties that he isn't showing up for work (i.e. both of my field assistants are hiding the poor work habits of one).

Also, they live in the area where I work, so they (presumably) go home every night. Things get complicated when aforementioned field assistant leaves home and then doesn't come back on time. But maybe changing the schedule is something worth considering. I'll definitely keep that in mind.

Thanks for your comments! I hope you keep reading and commenting. I can't think of any other commenters who (openly) also work in Africa.

Anonymous said...

I will definitely keep reading/commenting (I'll sign myself African Fieldworker so you know who I am)...found your blog a few months ago and love it! I'm actually in East African now (going on over a year, staying until December...my critters take a long to time to observe!). I don't have a blog of my own ...doubt I could keep the details from outing myself...

Michelle said...

I follow a similar system to the one you learned in your GIS class. I always keep a folder of original, untouched data, and then as I work on the data in a separate folder, I save each iteration with the date in the filename so I know when it was last updated.

(BTW, if you were wondering where I came from, I used to blog under 'panthera studentessa'. I took a much-needed break from the blogosphere, but I remember enjoying your blog very much.)

Karina said...

Michelle- I'm so glad you're blogging again! I missed your blog and wondered what happened. I'll follow your new one now :-)

Pablo said...

It might seem obvious, but you should also backup your data frequently, both the original files as well as those modified. Being myself a systems analyst, I have a lot of files related to programs in development; I make progressive copies of each folder containing each project. For example: If I start development on program "A", then I make a backup of the original files, and then copies of the folder containing all development files every time I modify something, so that it is relatively easy to roll back to a previous, recent version in the event of something going wrong. SO, I might end having a folder for A (original), and for A1, A2, A3, etc. It is easy and inexpensive to do so and could save valuable research or development stuff.

Karina said...

Thanks for the reminder, Pablo. I do regularly back up my data using Apple's Time Machine, which backs up any files that have been modified. I also often save new versions when I'm starting some big changes.