By default, an Nvidia Tesla C20XX card is configured to run as a high performance multithreaded coprocessor. You can use it to run CUDA and OpenCL programs, but it doesn't show up as an actual graphics card. If you hook up a display to it, the display runs with Microsoft's vgasave driver, and does not show up as a DirectX device.
I spent pretty much an entire day trying to figure out how to get it to run as a graphics card. In the back of my mind, I knew I had done it before, but just couldn't remember how. I spent ages on live chat with Nvidia customer support. I knew better than to trust first line support, and they were indeed a waste of time. I eventually eked the memory back into existence.
In the driver install folder (typically C:\Program Files\NVIDIA Corporation\NVSMI" run the command "nvidia-smi -dm 0". This sets the driver to run in graphics mode, treating your Tesla card as a graphics card.
There's a number of reasons NVIDIA has two separate driver modes. Graphics processing tasks have a maximum time limit of 2 seconds, after which Windows thinks the GPU has hung, and resets it. If you have some high performance task running, say a raytracing operation that takes a minute, this may fail. Running in a pure compute mode avoids this problem, but also disables the card from appearing as a graphics device.
Feb 24, 2012
Feb 8, 2012
Progress.
Big fan of this film. Often non-techies seem to fear technology, because it's disruptive and changes the world. Age old industries fall apart if when they don't embrace the change. It should bring fear to those profiting from the old business, but for the everyman, it should be times for huge opportunity.
I suspect those who are afraid, aren't really looking hard enough for those opportunities.
PressPausePlay from House of Radon on Vimeo.
I suspect those who are afraid, aren't really looking hard enough for those opportunities.
PressPausePlay from House of Radon on Vimeo.
Jan 8, 2012
Mmmm Fat.
There's this dish everyone in my family loves. I've always known it as "dee pong", because that's the Shanghainese name for it. It's some part of a pig's leg - that I know because it's got two big bones in it, along with the meat, fat and skin. It's braised in soy sauce and other spices, until the collagen rich skin is the texture of jello and the meat and fat melt off the bone. It's as rich as any foie, but meatier tasting. Unfortunately, there's probably only so much you can eat before keeling over from cardiac arrest.
I tried making it over the holidays. The first challenge was finding a recipe. Searching for combinations of "shanghai pork shoulder leg hock" let me to four similar recipes of varying complication (note: "ti pang" seems to be the most common spelling online). I then decided to screw with the recipes by using my mom's new thermal vacuum pot using some fancy low-temperature cooking technique. Or so I imagined.
I tried the recipe twice, the results were... not great. Perfectly edible, but far from perfect. The great thing about fatty, tough meats is that if you cook them long enough the flavorful fat will make up for all your errors, so even when it's bad, it's reasonably good. The four recipes had varying cooking times, from 2-12 hours. Most involved simmering/braising, though Albertitto's recipe called for an interesting immersed steaming technique. Two of the recipes called for refrigerating overnight.
Instead of using 15lb pork shoulder, I used a 2lb pork hock. That was probably my first mistake, but the pork hock was far more manageable, and I probably would have had leftovers like crazy with the 15lb piece. Also, pork hock is inexpensive. Instead of simmering for hours, I decided to give the vacuum pot a try. The word vacuum sounds fancy, and evokes images of sous vide. In reality, it's really just a giant thermos. The contents of the pot are not under vacuum - the pot has a vacuum insulating wall. You would cook something normally on the stove, and then stick it in the pot. The retained heat will continue cooking the food for hours as it cools - very slowly.
On my first attempt, I followed typical thermal vacuum instructions, bring the water inside to a boil, then letting it cook in the vacuum pot for four hours. At the end the meat was a light tea colour, hardly the rich colour I expected. I refrigerated overnight. The meat had gotten darker. One big difference with vacuum pot cooking is that the fluid doesn't evaporate. I ended up simmering for a couple of hours until the sauce reduced. The meat darkened during the simmering process. Overall the flavour was right, the meat fell off the bone, but was a bit on the dry side. Although soft, the skin wasn't at the desired jelly-like consistency, but was chewier, though some people might like that.
I thought it might have been that the boiling temperature was too high. On my second attempt, I tried lower temperatures. I heated the pork/sauce to about 175F, but let it cool to 160F before I stuck it in the pot. When I took it out 4 hours later, it had dropped to about 138F. I heated it back up to 160F and gave it another 4 hours. When I pulled the pork out, and shoved a meat thermometer in, it was still far stiffer than it should have been. Again, I had to simmer the sauce down, which did enough to tenderize the meat to an edible state. The result the second time wasn't much better than the first.
My suspicion is that I'd either need the low temperature going for much longer than 8 hours... perhaps 24 hours or more, which seems like a lot of work since I would need to reheat the pot every few hours, and make sure it doesn't drop in to the bacterial danger zone below 130F. The simmering process seemed to do far more in tenderizing the meat than the slow cooking. I'll have to try out the 24 hour version sometime... as well as the simple braise and the steaming method. After I've worked off this year's calories.
I tried making it over the holidays. The first challenge was finding a recipe. Searching for combinations of "shanghai pork shoulder leg hock" let me to four similar recipes of varying complication (note: "ti pang" seems to be the most common spelling online). I then decided to screw with the recipes by using my mom's new thermal vacuum pot using some fancy low-temperature cooking technique. Or so I imagined.
I tried the recipe twice, the results were... not great. Perfectly edible, but far from perfect. The great thing about fatty, tough meats is that if you cook them long enough the flavorful fat will make up for all your errors, so even when it's bad, it's reasonably good. The four recipes had varying cooking times, from 2-12 hours. Most involved simmering/braising, though Albertitto's recipe called for an interesting immersed steaming technique. Two of the recipes called for refrigerating overnight.
Instead of using 15lb pork shoulder, I used a 2lb pork hock. That was probably my first mistake, but the pork hock was far more manageable, and I probably would have had leftovers like crazy with the 15lb piece. Also, pork hock is inexpensive. Instead of simmering for hours, I decided to give the vacuum pot a try. The word vacuum sounds fancy, and evokes images of sous vide. In reality, it's really just a giant thermos. The contents of the pot are not under vacuum - the pot has a vacuum insulating wall. You would cook something normally on the stove, and then stick it in the pot. The retained heat will continue cooking the food for hours as it cools - very slowly.
On my first attempt, I followed typical thermal vacuum instructions, bring the water inside to a boil, then letting it cook in the vacuum pot for four hours. At the end the meat was a light tea colour, hardly the rich colour I expected. I refrigerated overnight. The meat had gotten darker. One big difference with vacuum pot cooking is that the fluid doesn't evaporate. I ended up simmering for a couple of hours until the sauce reduced. The meat darkened during the simmering process. Overall the flavour was right, the meat fell off the bone, but was a bit on the dry side. Although soft, the skin wasn't at the desired jelly-like consistency, but was chewier, though some people might like that.
I thought it might have been that the boiling temperature was too high. On my second attempt, I tried lower temperatures. I heated the pork/sauce to about 175F, but let it cool to 160F before I stuck it in the pot. When I took it out 4 hours later, it had dropped to about 138F. I heated it back up to 160F and gave it another 4 hours. When I pulled the pork out, and shoved a meat thermometer in, it was still far stiffer than it should have been. Again, I had to simmer the sauce down, which did enough to tenderize the meat to an edible state. The result the second time wasn't much better than the first.
My suspicion is that I'd either need the low temperature going for much longer than 8 hours... perhaps 24 hours or more, which seems like a lot of work since I would need to reheat the pot every few hours, and make sure it doesn't drop in to the bacterial danger zone below 130F. The simmering process seemed to do far more in tenderizing the meat than the slow cooking. I'll have to try out the 24 hour version sometime... as well as the simple braise and the steaming method. After I've worked off this year's calories.
They should make a movie out of this.
Stuxnet was probably the most intriguing virus to make news in the last few years. It's pretty old news now, and if you've forgotten, it's a virus that was floating around the internet back in 2010. It was intriguing in that that took advantage of 3 Windows zero-day exploits, which is a feat in itself. Even though it infected plenty of Windows machines, it only targetted SCADA control systems - control software for industrial machinery, not your everyday desktop. Further analysis concluded it was likely designed to infiltrate Iranian nuclear facilities and damage the centrifuges used for enriching weapons grade uranium.
This video isn't new, but it's probably the most interesting thing I've seen on YouTube in the past four months. Almost makes me miss the old WinDbg days. Almost. Warning: there's some serious language in this video.
Jan 3, 2012
No Quadrantids.
Was hoping to see some shooting stars out tonight, but I didn't think about how bright it is with the city lights outside, and all I see is cloud cover at 2am.
Off to bed I guess.
Off to bed I guess.
Nov 21, 2011
Earth from Space.
I remember hearing as a child how the Great Wall of China was one of the few man made structures visible from space. And then I saw all these National Geographic published photos of the earth, all blue and green. They didn't have night shots back then. This is incredible.
Earth | Time Lapse View from Space, Fly Over | NASA, ISS from Michael König on Vimeo.
Earth | Time Lapse View from Space, Fly Over | NASA, ISS from Michael König on Vimeo.
Oct 5, 2011
Capture One 6.3: A Performance Review
Shortly before moving out of California, I noticed that the latest version of Capture One (C1) uses OpenCL. Capture One 6 was announced last December, so it was well over half a year at that point, and I was really surprised that I hadn't heard any buzz about it, especially from the NVIDIA or AMD marketting machines. You would think either would leap at the chance to sell more GPUs. Although GPUs could make RAW editing a whole lot smoother, NVIDIA has never paid serious attention to the photography market. I'm assuming someone must have crunched the numbers and determined that it was too niche a market to address. I know that there was some investigation at some point, which determined that the speedup for processing RAW was not significant enough. This was mostly because part of the RAW decode was done on the CPU, and memory transfer times between CPU and GPU nullified much of the benefit of the GPU decode. I suspected though, that the analysis didn't really reflect real-world situations where users would want to upload the video to the GPU once, and decode the RAW multiple times as they adjusted various settings. This would be a huge improvement for photo editing apps and make the UI much smoother. It looks like Phase One has gone and done that, at least to a degree.
So it's been about a month since I've gotten back, and I've barely touched my pictures in that time. I decided finally to sit down and give Capture One 6 a shot, since I switched over to Lightroom a couple of years ago. The software is already up to version 6.3. I used the trial version, so I could check out Pro. I was curious about some of he features, especially the advanced noise reduction and local adjustments. I'm running on Windows 7 SP1 64-bit, on a Core i7 930 with 12GB of RAM, and a motley assortment of hard disks. I tried using both a GF106 (GeForce GTS450 or Quadro 2000) as well as a GF100 (GeForce GTX 470 or Quadro 6000). I'm playing with full resolution RAW images shot on a Canon 5D MarkII. While I was pretty happy with C1 versions 2 and 3 for dealing with images from my Rebel and Rebel XTi on an old Athlon system, the huge images from the 5DMk2 has put some annoying lag into the workflow, that never went away even after upgrading to the i7.
OpenCL Benefits
C1 allows you to turn on and off OpenCL support in the Edit\Preferences dialog. There's a setting for OpenCL, either Auto or Never (this setting doesn't appear if your system doesn't have an OpenCL driver). Other than the setting, there's no other indicator that OpenCL is being used. There's information in a Phase One knowledgebase article on when OpenCL is used, it's pretty accurate, but not very detailed.
After poking around with the OpenCL setting enabled/disabled, I figured OpenCL is only used for processing the pixels onscreen when viewing a RAW file. This is pretty limited, but it does provide some noticeable benefits that do impact my workflow.
1. the view updates much quicker when switching between images, particularly when zoomed in
2. the view updates much qucker when making adjustments, particularly when zoomed in
*1 Given that these were all visual changes, it was difficult to give any benchmark measurements. Situation 1 was where I noticed the most measurable differences. When switching from one image to another on CPU only, the new image would load in very blurry, then get less blurry, then crisp. There was also about a 0.5 second delay before the image changed at all, but that happened with both CPU and GPU, so I'm going to ignore that in the comparision. My rough attempts at measuring the time with a stopwatch makes it look like this: new image appears very blurry -> 0.5 seconds -> less blurry -> 1 second -> crisp, so it would take almost 2 seconds every time I changed from one image to the next. It gets painful if you're sorting through 150 images after some sort of shoot. The first 0.5 second delay after the keypress was there on every GPU, so I'm going to ignore that, and focus on the first delay after the image appears, and the second delay when it becomes clear.
GPU behavior looked the same, but upon closer inspection it looks liek the pipeline was different. Instead of going from "very blurry"->"less blurry"->crisp, it went through a "blurry"->"crisp"->"denoised/sharpened" stages. It was barely any time at all on either GPU to get from blurry to crisp. I'll say 0.2 seconds. Additionally, the final "denoise/sharpen" stage seemed to vary a bit depending on the zoom level. Initially I thought the speed depended on the GPU, but then I noticed that in zoom levels > 100%, it looked like it would do some sort of denoise, while if the zoom < 100%, it would sharpen. The sharpen (at low zoom) would take about 1 second on both GPUs (I'm guessing it was actually performed on the CPU). The denoise seemed to take about 0.5 second on the GF106, and was nearly instantaneous on the GF100.
For Zoom < 100
For Zoom > 100
*2 When making adjustments with the sliders, it was noticeably smoother on the GPUs than the CPU, though even the GPU wasn't perfectly smooth. It also depended on the setting. For example, adjusting Exposure was reasonably smooth on the CPU, and wasn't too much smoother on the GPU. The Moire slider though, which was completely un-smooth on the CPU, showed a marked difference. The biggest difference however, was that when zoomed in, the CPU version of Exposure, Curves, and Colour would use the preview sized image while adjusting. Once you move the slider, the image would get downsampled and blurry to show the adjustments, and then come back into focus. The GPU version would remain in focus the whole time. The GF100 was a bit smoother than the GF106. This isn't something you'd notice initially, since it just seems like the way it ought to be. But if you go back to the CPU version, it's horrible.
Things that don't improve with OpenCL
For all the improvements, there are a lot of things that don't improve, and really reduce the effectiveness of the OpenCL implementation. As I've already mentioned, it looks like there's a CPU sharpening pass that slows down the image rendering. Other noticeable items are:
- Speed depends on whether the image is cached. The numbers above are the best case. If you switch to an image that isn't cached, it could take a few seconds to read from disk, regardless of your CPU or GPU. This actually happens a lot, so you don't really get the optimized speeds listed above unless you're going back and forth between a set of images.
- JPEGs don't go through the GPU pipeline, so if you happen to have JPEG thumbnails, sorting through them will slow you down, even though scrolling through RAWs is now faster.
- Zooming in isn't sped up, you still see a bunch of pixelated pixels for half a second before the proper pixels are displayed.
- Panning isn't sped up, so when you move the image over, you get a bunch of new pixelated pixels for half a second before the proper pixels are displayed.
- Final rendering doesn't take advantage of the GPU. You might get a slight improvement if they're going on in the background, since the CPU is a little less busy with the UI.
Shoehorned in
As I mentioned initially, it looks like Phase One took whatever was visible onscreen, and sped up the processing of that bitmap using OpenCL. This makes sense given that they have an existing codebase, This means they don't have to start from scratch, and they can reuse existing functions like their CPU sharpening algorithm. The problem is that their original design is based on the fact that processing many pixels is processor intensive, so they optimize by only processing the pixels onscreen. Trying to shim the GPU using their existing design is rather limiting. They're most likely sending the visible bitmap to the GPU for OpenCL processing, which is fast. Then I suspect they're copying back down to system memory so that it can fit right back into their existing pipeline and have the CPU finish whatever work it needed to do. There's a few items that are sped up, but they're far from perfect, and in the grand scheme of things, they perceived improvement is just not that spectacular given the bottlenecks.
The ideal design to take full advantage of the GPU would be to upload the entire image to video memory, and manipulate it in video memory using the GPU, and display it, without copying it back down to system memory. With GPU memories being typically 512MB or 1GB+ these days, it should be a problem to fit multiple 20MP images in video memory. Making copies of the images is much faster on the GPU as well, since video memory tends to be 3-10x as fast as system memory. The result would be much smoother performance in adjustments, as well as smooth zooming and panning.
The main drawback, and I suspect the reason Phase One didn't go down this route, is that it requires rewriting the entire application. All the complicated image processing algorithms would need to be rewritten. Not only that but they'd need to maintain both pipelines. It's hard to argue against this. While technically far superior, this would mean twice the work. Phase One would need twice the sales to justify it. If C1 was fully GPU accelerated, I'd probably switch to it over Lightroom. One strong argument is that an i3 laptop with a midrange GPU could outperform a much more expensive i7 system. It's quite possible that C1 could take a sizeable share of the Lightroom market if their app is that much faster and smoother.
They could also potentially support one pipeline with OpenCL (or CUDA), and rely on CPU implmentations when the user does not have a GPU.
A third, weak argument to design C1 around GPU processing is that the CPUs on forthcoming Wiin8 tablet PCs will be rather weak. A GPU solution would far smoother, especially for allowing the user to drag the image around with their fingertips. A RAW editing app on a tablet would be great, but I suspect it would HAVE to be GPU oriented. Problem is none of the tablet GPUs are particularly programmable yet. We'd probably have to wait another generation - I suspect late 2012, and at this point I wouldn't know whether CUDA, OpenCL or DirectX 12 would be the way to go.
Other Bottlenecks
The other major bottleneck in C1 is the disk while rendering final images. The rendering is done purely on CPU. Tests across the multiple GPUs showed the same time, roughly 3:20 for 30 RAWs from a 5D MarkII. On a quad core i7, the CPU cycled between 0-100% load. The average was only maybe 50%. I suspected the bottleneck was my disk, so I tried using two disks, using one as the source, and the other as the destination. That made no difference. What I did notice was that C1 was writing to 8 output JPEGs at once. Most likely, the thrashing caused by this was limiting performance. The disk output was pretty slow, maybe 1-5MB/s, probably due to the thrashing. Phase One may be relying on their customers to purchase SSDs or use RAID arrays, but if they queued up their disk writes, it could potentially halve their rendering time on quad-core CPUs.
As a side note, I probably bought the wrong components on my PC. With a quad core i7 and 12GB of RAM, I rarely saturate the CPU (only when compiling using MSVC), or the RAM (only when running multiple VMs). Very few apps seem to be optimized for a fast CPU and lots of RAM. I had though that the extra RAM would mean fewer disk accesses, but there's still many cases (like both C1 and Lightroom) where I'm disk bound, with plenty of free memory.
Other notes on C1
Capture One LE was my first RAW workflow application, and I'm probably biased towards it because of that. I never fully got used to Lightroom's model with different modes, and the Lightroom's export to JPEG always felt a little weird, since the UI features rendering for print or web, and I never do either. When C1 was redesigned with the new .NET based UI, there were a number of things that turned me off, and I eventually switched over to Lightroom. I always organized my RAW files in folders by date. I hated the way C1 would put its working folder into every one of my folders. I loved the way the older C1 let me put all my final output files as subfolders of the image folders - now this seems to be available in the Pro version only. I find this seriously annoying.
In addition to those annoyances, the slowness of C1, the poor noise reduction, and the spot healing tool led me over to Lightroom. The OpenCL support has definitely improved the performance. The noise reduction seems much improved as well, though I haven't played with it enough to really judge it against Lightroom 3. There's a new spot removal tool that works pretty well. They've also added in keystone correction, which is only in the Pro version.
Right now I'm quite happy to use C1 Pro over LR3. It's $399 though, which is pretty steep unless it's discounted. There's a handful of features that I'd use in the Pro over Express - the output folder management, along with keystone correction, and maybe RGB curves. If they could come up with a third intermediate version that would match the LR3 feature set, that'd be perfect.
So it's been about a month since I've gotten back, and I've barely touched my pictures in that time. I decided finally to sit down and give Capture One 6 a shot, since I switched over to Lightroom a couple of years ago. The software is already up to version 6.3. I used the trial version, so I could check out Pro. I was curious about some of he features, especially the advanced noise reduction and local adjustments. I'm running on Windows 7 SP1 64-bit, on a Core i7 930 with 12GB of RAM, and a motley assortment of hard disks. I tried using both a GF106 (GeForce GTS450 or Quadro 2000) as well as a GF100 (GeForce GTX 470 or Quadro 6000). I'm playing with full resolution RAW images shot on a Canon 5D MarkII. While I was pretty happy with C1 versions 2 and 3 for dealing with images from my Rebel and Rebel XTi on an old Athlon system, the huge images from the 5DMk2 has put some annoying lag into the workflow, that never went away even after upgrading to the i7.
OpenCL Benefits
C1 allows you to turn on and off OpenCL support in the Edit\Preferences dialog. There's a setting for OpenCL, either Auto or Never (this setting doesn't appear if your system doesn't have an OpenCL driver). Other than the setting, there's no other indicator that OpenCL is being used. There's information in a Phase One knowledgebase article on when OpenCL is used, it's pretty accurate, but not very detailed.
After poking around with the OpenCL setting enabled/disabled, I figured OpenCL is only used for processing the pixels onscreen when viewing a RAW file. This is pretty limited, but it does provide some noticeable benefits that do impact my workflow.
1. the view updates much quicker when switching between images, particularly when zoomed in
2. the view updates much qucker when making adjustments, particularly when zoomed in
*1 Given that these were all visual changes, it was difficult to give any benchmark measurements. Situation 1 was where I noticed the most measurable differences. When switching from one image to another on CPU only, the new image would load in very blurry, then get less blurry, then crisp. There was also about a 0.5 second delay before the image changed at all, but that happened with both CPU and GPU, so I'm going to ignore that in the comparision. My rough attempts at measuring the time with a stopwatch makes it look like this: new image appears very blurry -> 0.5 seconds -> less blurry -> 1 second -> crisp, so it would take almost 2 seconds every time I changed from one image to the next. It gets painful if you're sorting through 150 images after some sort of shoot. The first 0.5 second delay after the keypress was there on every GPU, so I'm going to ignore that, and focus on the first delay after the image appears, and the second delay when it becomes clear.
GPU behavior looked the same, but upon closer inspection it looks liek the pipeline was different. Instead of going from "very blurry"->"less blurry"->crisp, it went through a "blurry"->"crisp"->"denoised/sharpened" stages. It was barely any time at all on either GPU to get from blurry to crisp. I'll say 0.2 seconds. Additionally, the final "denoise/sharpen" stage seemed to vary a bit depending on the zoom level. Initially I thought the speed depended on the GPU, but then I noticed that in zoom levels > 100%, it looked like it would do some sort of denoise, while if the zoom < 100%, it would sharpen. The sharpen (at low zoom) would take about 1 second on both GPUs (I'm guessing it was actually performed on the CPU). The denoise seemed to take about 0.5 second on the GF106, and was nearly instantaneous on the GF100.
For Zoom < 100
| Processor | Time to Clear Image | Total time for Final Image (including initial delay) |
| CPU | 1.8 | 2.3 |
| GF106 | 0.2 | 2 |
| GF100 | 0.2 | 2 |
For Zoom > 100
| Processor | Time to Clear Image | Total time for Final Image (including initial delay) |
| CPU | 1 | 1.5 |
| GF106 | 0.2 | 1 |
| GF100 | 0.2 | 0.3 |
*2 When making adjustments with the sliders, it was noticeably smoother on the GPUs than the CPU, though even the GPU wasn't perfectly smooth. It also depended on the setting. For example, adjusting Exposure was reasonably smooth on the CPU, and wasn't too much smoother on the GPU. The Moire slider though, which was completely un-smooth on the CPU, showed a marked difference. The biggest difference however, was that when zoomed in, the CPU version of Exposure, Curves, and Colour would use the preview sized image while adjusting. Once you move the slider, the image would get downsampled and blurry to show the adjustments, and then come back into focus. The GPU version would remain in focus the whole time. The GF100 was a bit smoother than the GF106. This isn't something you'd notice initially, since it just seems like the way it ought to be. But if you go back to the CPU version, it's horrible.
Things that don't improve with OpenCL
For all the improvements, there are a lot of things that don't improve, and really reduce the effectiveness of the OpenCL implementation. As I've already mentioned, it looks like there's a CPU sharpening pass that slows down the image rendering. Other noticeable items are:
- Speed depends on whether the image is cached. The numbers above are the best case. If you switch to an image that isn't cached, it could take a few seconds to read from disk, regardless of your CPU or GPU. This actually happens a lot, so you don't really get the optimized speeds listed above unless you're going back and forth between a set of images.
- JPEGs don't go through the GPU pipeline, so if you happen to have JPEG thumbnails, sorting through them will slow you down, even though scrolling through RAWs is now faster.
- Zooming in isn't sped up, you still see a bunch of pixelated pixels for half a second before the proper pixels are displayed.
- Panning isn't sped up, so when you move the image over, you get a bunch of new pixelated pixels for half a second before the proper pixels are displayed.
- Final rendering doesn't take advantage of the GPU. You might get a slight improvement if they're going on in the background, since the CPU is a little less busy with the UI.
Shoehorned in
As I mentioned initially, it looks like Phase One took whatever was visible onscreen, and sped up the processing of that bitmap using OpenCL. This makes sense given that they have an existing codebase, This means they don't have to start from scratch, and they can reuse existing functions like their CPU sharpening algorithm. The problem is that their original design is based on the fact that processing many pixels is processor intensive, so they optimize by only processing the pixels onscreen. Trying to shim the GPU using their existing design is rather limiting. They're most likely sending the visible bitmap to the GPU for OpenCL processing, which is fast. Then I suspect they're copying back down to system memory so that it can fit right back into their existing pipeline and have the CPU finish whatever work it needed to do. There's a few items that are sped up, but they're far from perfect, and in the grand scheme of things, they perceived improvement is just not that spectacular given the bottlenecks.
The ideal design to take full advantage of the GPU would be to upload the entire image to video memory, and manipulate it in video memory using the GPU, and display it, without copying it back down to system memory. With GPU memories being typically 512MB or 1GB+ these days, it should be a problem to fit multiple 20MP images in video memory. Making copies of the images is much faster on the GPU as well, since video memory tends to be 3-10x as fast as system memory. The result would be much smoother performance in adjustments, as well as smooth zooming and panning.
The main drawback, and I suspect the reason Phase One didn't go down this route, is that it requires rewriting the entire application. All the complicated image processing algorithms would need to be rewritten. Not only that but they'd need to maintain both pipelines. It's hard to argue against this. While technically far superior, this would mean twice the work. Phase One would need twice the sales to justify it. If C1 was fully GPU accelerated, I'd probably switch to it over Lightroom. One strong argument is that an i3 laptop with a midrange GPU could outperform a much more expensive i7 system. It's quite possible that C1 could take a sizeable share of the Lightroom market if their app is that much faster and smoother.
They could also potentially support one pipeline with OpenCL (or CUDA), and rely on CPU implmentations when the user does not have a GPU.
A third, weak argument to design C1 around GPU processing is that the CPUs on forthcoming Wiin8 tablet PCs will be rather weak. A GPU solution would far smoother, especially for allowing the user to drag the image around with their fingertips. A RAW editing app on a tablet would be great, but I suspect it would HAVE to be GPU oriented. Problem is none of the tablet GPUs are particularly programmable yet. We'd probably have to wait another generation - I suspect late 2012, and at this point I wouldn't know whether CUDA, OpenCL or DirectX 12 would be the way to go.
Other Bottlenecks
The other major bottleneck in C1 is the disk while rendering final images. The rendering is done purely on CPU. Tests across the multiple GPUs showed the same time, roughly 3:20 for 30 RAWs from a 5D MarkII. On a quad core i7, the CPU cycled between 0-100% load. The average was only maybe 50%. I suspected the bottleneck was my disk, so I tried using two disks, using one as the source, and the other as the destination. That made no difference. What I did notice was that C1 was writing to 8 output JPEGs at once. Most likely, the thrashing caused by this was limiting performance. The disk output was pretty slow, maybe 1-5MB/s, probably due to the thrashing. Phase One may be relying on their customers to purchase SSDs or use RAID arrays, but if they queued up their disk writes, it could potentially halve their rendering time on quad-core CPUs.
As a side note, I probably bought the wrong components on my PC. With a quad core i7 and 12GB of RAM, I rarely saturate the CPU (only when compiling using MSVC), or the RAM (only when running multiple VMs). Very few apps seem to be optimized for a fast CPU and lots of RAM. I had though that the extra RAM would mean fewer disk accesses, but there's still many cases (like both C1 and Lightroom) where I'm disk bound, with plenty of free memory.
Other notes on C1
Capture One LE was my first RAW workflow application, and I'm probably biased towards it because of that. I never fully got used to Lightroom's model with different modes, and the Lightroom's export to JPEG always felt a little weird, since the UI features rendering for print or web, and I never do either. When C1 was redesigned with the new .NET based UI, there were a number of things that turned me off, and I eventually switched over to Lightroom. I always organized my RAW files in folders by date. I hated the way C1 would put its working folder into every one of my folders. I loved the way the older C1 let me put all my final output files as subfolders of the image folders - now this seems to be available in the Pro version only. I find this seriously annoying.
In addition to those annoyances, the slowness of C1, the poor noise reduction, and the spot healing tool led me over to Lightroom. The OpenCL support has definitely improved the performance. The noise reduction seems much improved as well, though I haven't played with it enough to really judge it against Lightroom 3. There's a new spot removal tool that works pretty well. They've also added in keystone correction, which is only in the Pro version.
Right now I'm quite happy to use C1 Pro over LR3. It's $399 though, which is pretty steep unless it's discounted. There's a handful of features that I'd use in the Pro over Express - the output folder management, along with keystone correction, and maybe RGB curves. If they could come up with a third intermediate version that would match the LR3 feature set, that'd be perfect.
Sep 9, 2011
Canada vs. The World
Did a bit of research on current smartphone OS market share.
I assume the high adoption for Android and Symbian globally is because cheaper phones are more common in less wealthy countries. Canadians must be pretty well off, since they love their iPhones, and hate on Android. I'm not sure if the RIM-love is patriotism, deals between RIM and Bell/Rogers, or some kinda of duty implication that benefits RIM handsets. (US/Canada data from Comscore, Global data from Gartner)
| Android | iOS | RIM | Symbian | |
| Global | 43% | 18% | 12% | 22% |
| Canada | 12% | 31% | 42% | 6.4% |
| US | 36% | 26% | 21% | 1.9% |
I assume the high adoption for Android and Symbian globally is because cheaper phones are more common in less wealthy countries. Canadians must be pretty well off, since they love their iPhones, and hate on Android. I'm not sure if the RIM-love is patriotism, deals between RIM and Bell/Rogers, or some kinda of duty implication that benefits RIM handsets. (US/Canada data from Comscore, Global data from Gartner)
Subscribe to:
Posts (Atom)