John Muir Laws says: I love the Sibley guides. They are really well done. There is an east and a west cost version. July 1, at pm. Jeanna says: Hi, I recently bought your book on how to draw birds and I love it! May 19, at am. Juan Orozco says: This really helped i made a really cool bird.
April 15, at pm. Sarah says: Incredibly helpful! April 8, at am. Step by step guidance is easy for sketch. Thank you so much. December 10, at am.
Finn says: I would like to show you some pictures, of my drawings. October 14, at am. John Muir Laws says: Hi there Fin. October 17, at am. August 24, at am. Karly says: Hey Jack! June 23, at pm. Christine Barnes says: I love to observe, write about and draw flowers and other static items of nature And am progressing nicely in learning individual bird songs.
June 3, at am. John Muir Laws says: I have some blog posts on that subject. Harrison Schlabach says: I love it, but there is one thing: I wish you could pause the video things Reply.
May 24, at am. May 27, at am. Danielle Thys says: I had so much fun last week, I got 7 additional people to sign up! May 21, at am. T Jones says: I would like to thank you for all that you do to teach people like me. May 5, at pm. John Muir Laws says: not yet. May 12, at am. Aristakhos says: Very useful tips, thank you. April 4, at am. Ember says: Omg, you are so talented. I would never have been able to draw that good Reply.
May 14, at am. John Muir Laws says: You will see amazing changes if you start drawing on a regular basis. May 16, at pm. Moldy Potato says: its okay, I am not very good at drawing. I have been drawing for 9 years… Reply. May 29, at am. March 20, at pm.
December 31, at pm. December 4, at pm. John Muir Laws says: wonderful. Happy sketching! December 5, at pm. Cybil Kho says: Hi John, Thanks so much for your generosity. November 29, at pm. August 15, at am. John Muir Laws says: Thank you, Jason. Keep sketching! September 7, at am. July 7, at pm. May 4, at am.
Marilyn Houston says: Greetings, Thank you for this. Marilyn Houston Reply. April 28, at pm. Elliott Narber says: Nice birds Reply. April 11, at am.
Elliott Narber says: Nice Reply. Stephanie Puckett says: Many thanks for taking the time to put this all together and have it available to anyone that is interested. February 17, at pm. Robin says: Thanks for sharing this! February 12, at pm. John Laws says: Thank you, that is a great format. April 24, at am. Casey says: Thank you so much for this write up. I look forward to buying your book and hopefully improving my 3 year old level drawing skills.
Thank you so much Reply. April 13, at am. Alberto says: Thank you very much Mr. I wonder if you have any post or video on drawing black-feathered birds with coloured pencils… Reply. March 23, at am. John Laws says: Thank you, I will try to put something like that together.
April 17, at pm. Alberto says: Thanks!! June 4, at am. From someone very new in the field, many thanks to inspire so much enthusiasm for this activity. Elizabeth Reply. December 1, at pm.
February 3, at pm. Davis says: You are an amazing teacher. February 7, at pm. John Laws says: Thank you for your support. Are there any posts I could add to be helpful to you? February 10, at pm. January 4, at pm. Tanima says: Thank you for this wonderful and scientific tutorial. October 16, at pm. John Laws says: Hi Tanima, I have a little on colored pencil in my Guide to Nature Drawing and Journaling, also a few colored pencil posts and a video workshop on this site.
October 24, at pm. August 21, at pm. John Laws says: I am so happy that this is of help to you and your homeschool group. August 30, at am. I love to go birding and am really looking forward to getting back into drawing in nature.
Where can I find out more information? June 22, at pm. John Laws says: Hi Sheryl,Check out my calendar of events and join the nature journal club! June 27, at pm. Conifer Morze says: just found your site and just began drawing later in my life-live in very rural area and difficult to find art instructor -it was very exciting for me to find your wonderful drawing instructions-thank you! May 6, at am. Ruth says: I enjoyed drawing the bird with the feathers. March 6, at pm.
Bruce H. February 25, at am. Adina says: I love birds and drawing them…which of your books would you recommend for drawing birds seen in the wild…on the go, if you have one that is?
July 27, at am. July 30, at am. Kath Graham says: Really excellent tutorial. July 11, at pm. May 22, at pm. John Muir Laws says: Thank you Sune, I hope that this work inspires greater creativity and curiosity.
May 23, at pm. Daniel Storksen says: This is cool. Nice job, dude. June 2, at am. Martin says: Wow! May 13, at am. Natty says: In my drawing class we are learning to draw Sulfur Crested Cuckatoos local birds your tutorial has really helped. Thanks again Reply. May 12, at pm. October 13, at pm. Jon lloyd says: What a wonderful teaching resource.
I have learnt a lot. Thank you. September 14, at pm. July 8, at am. June 12, at am. November 10, at pm. Tracy Taylor says: Your book is terrific. Are you working on any big projects these days? November 6, at am. John Muir Laws says: Thank you Tracy, Your work is amazing I have been enjoying exploring the illustrations on your site.
November 8, at pm. Jennifer Grunschlag says: great demo of how to draw birds! May 17, at pm. Marysia says: I just love it! Big thank you. May 10, at am. Sam Brunson says: Bookmarking this page! April 1, at pm. The following figure shows the analysis results for the Divergent Execution analysis stage. Some kernel instance analysis results, like Divergent Execution , are associated with specific source-lines within the kernel. To see the source associated with each result, select an entry from the table.
The source file associated with that entry will open. Devices with compute capability 5. In this feature PC and state of warp are sampled at regular interval for one of the active warps per SM. The warp state indicates if that warp issued an instruction in a cycle or why it was stalled and could not issue an instruction.
When a warp that is sampled is stalled, there is a possibility that in the same cycle some other warp is issuing an instruction.
Hence the stall for the sampled warp need not necessarily indicate that there is a hole in the instruction issue pipeline. Refer to the Warp State section for a description of different states. Devices with compute capability 6. The latency samples indicate the reasons for holes in the issue pipeline. While collecting these samples, there is no instruction issued in the respective warp scheduler and hence these give the latency reasons.
The latency reasons will be one of the stall reasons in Warp State section except 'not selected' stall reason. In this view, the sample distribution for all functions and kernels is given in a table.
A pie chart shows the distribution of stall reasons collected for each kernel. After clicking on the source file or device function the Kernel Profile - PC Sampling view is opened. The hotspots shown next to the vertical scroll bar are determined by the number of samples collected for each source and assembly line.
The distribution of the stall reasons is shown as a stacked bar for each source and assembly line. This helps in pinpointing the latency reasons at the source code level. For devices with compute capability 6. Hotspots can be seleted to point to hotspot of 'Warp State' or 'Latency Reasons'. The tables in result section give percentage distribution for total latency samples, issue pipeline busy samples and instruction issued samples.
The chart shows a summary view of the memory hierarchy of the CUDA programming model. The green nodes in the diagram depict logical memory space whereas blue nodes depicts actual hardware unit on the chip.
For the various caches the reported percentage number states the cache hit rate; that is the ratio of requests that could be served with data locally available to the cache over all requests made. The links between the nodes in the diagram depict the data paths between the SMs to the memory spaces into the memory system. Different metrics are shown per data path. The data paths from the SMs to the memory spaces Global, Local, Texture, Surface and Shared report the total number of memory instructions executed, it includes both read and write operations.
The data path between memory spaces and "Unified Cache" or "Shared Memory" reports the total amount of memory requests made. All other data paths report the total amount of transferred memory in bytes. The topology is collected by default along with the timeline.
A logical link comprises of 1 to 4 physical NVLinks of same properties connected between two devices. The Source-Disassembly View is used to display the analysis results for a kernel at the source and assembly instruction level. To be able to view the kernel source you need to compile the code using the -lineinfo option. If this compiler option is not used, only the disassembly view will be shown. As part of the Guided Analysis or Unguided Analysis for a kernel the analysis results are displayed under the Analysis view.
After clicking on the source file or device function the Source-Disassembly view is opened. If the source file is not found a dialog is opened to select and point to the new location of the source file. This can happen for example when the profiling is done on a different system. Hotspots are colored based on level of importance - low, medium or high.
Hovering the mouse over the hotspot displays the value of the profiler data, the level of importance and the source or disassembly line. You can click on a hotspot at the source level or assembly instruction level to view the source or disassembly line corresponding to the hotspot. In the disassembly view the assembly instructions corresponding to the selected source line are highlighted. You can click on the up and down arrow buttons displayed at the right of the disassembly column header to navigate to the next or previous instruction block.
The GPU Details View displays a table of information for each memory copy and kernel execution in the profiled application. The following figure shows the table containing several memcpy and kernel executions. Each row of the table contains general information for a kernel execution or memory copy.
For kernels, the table will also contain a column for each metric or event value collected for that kernel. In the figure, the Achieved Occupancy column shows the value of that metric for each of the kernel executions. You can sort the data by column by left clicking on the column header, and you can rearrange the columns by left clicking on a column header and dragging it to its new location. If you select a row in the table, the corresponding interval will be selected in the Timeline View.
Similarly, if you select a kernel or memcpy interval in the Timeline View the table will be scrolled to show the corresponding data. If you hover the mouse over a column header, a tooltip will display the data shown in that column. For a column containing event or metric data, the tooltip will describe the corresponding event or metric. The Metrics Reference section contains more detailed information about each metric.
Specific event and metric values can be collected for each kernel and displayed in the details table. Use the toolbar icon in the upper right corner of the view to configure the events and metrics to collect for each device, and to run the application to collect those events and metrics. By default the table shows one row for each memcpy and kernel invocation. Alternatively, the table can show summary results for each kernel function.
Use the toolbar icon in the upper right corner of the view to select or deselect summary format. The numbers in the table can be displayed either with or without grouping separators. Use the toolbar icon in the upper right corner of the view to select or deselect grouping separators. The contents of the table can be exported in CSV format using the toolbar icon in the upper right corner of the view.
This view details the amount of time your application spends executing functions on the CPU. Each thread is sampled periodically to capture its callstack and the summary of these measurements are displayed in this view. You can manipulate the view by selecting different orientations for organizing the callstack: Top-down, Bottom-up, Code Structure 3 , choosing which thread to view 1 , and by sorting or highlighting a specific thread 7, 8.
This change to the view is the result of sorting by thread 3 7 and highlighting it 8. To be displayed the source files must be on the local file system. By default the directory containing the executable or profile file is searched.
If the source file cannot be found a prompt will appear asking for its location. Sometimes a file within a specific directory is being sought, in this case you should give the path to where this directory resides. The time your application spends in a parallel region or idling is shown both on the timeline and is summarized in this view.
The reference for the percentage of time spent in each type of activity is the time from the start of the first parallel region to the end of the last parallel region. The Properties View shows information about the row or interval highlighted or selected in the Timeline View. If a row or interval is not selected, the displayed information tracks the motion of the mouse pointer. If a row or interval is selected, the displayed information is pinned to that row or interval.
When an OpenACC interval with an associated source file is selected, this filename is shown in the Source File table entry. Double-clicking on the filename opens the respective source file if it is available on the file-system.
The Console View shows stdout and stderr output of the application each time it executes. If you need to provide stdin input to your application, do so by typing into the console view. The Settings View allows you to specify execution settings for the application being profiled.
As shown in the following figure, the Executable settings tab allows you to specify the executable file, the working directory, the command-line arguments, and the environment for the application.
Only the executable file is required, all other fields are optional. The Executable settings tab also allows you to specify an optional execution timeout. If the execution timeout is specified, the application execution will be terminated after that number of seconds. If the execution timeout is not specified, the application will be allowed to continue execution until it terminates normally. The Start execution with profiling enabled checkbox is set by default to indicate that application profiling begins at the start of application execution.
If you are using cudaProfilerStart and cudaProfilerStop to control profiling within your application as described in Focused Profiling , then you should uncheck this box.
The Enable concurrent kernel profiling checkbox is set by default to enable profiling of applications that exploit concurrent kernel execution. If this checkbox is unset, the profiler will disable concurrent kernel execution. Disabling concurrent kernel execution can reduce profiling overhead in some cases and so may be appropriate for applications that do not exploit concurrent kernels.
The Enable power, clock, and thermal profiling checkbox can be set to enable low frequency sampling of the power, clock, and thermal behavior of each GPU used by the application. This view can be opened in the CPU Details View by double-clicking on a function in the tree—the source file that corresponds to this function is then opened. Line numbers can be enabled by right-clicking left side ruler.
When you first start the Visual Profiler , and after closing the Welcome page, you will be presented with a default placement of the views. By moving and resizing the views, you can customize the profiler to meet your development needs.
Any changes you make are restored the next time you start the profiler. To resize a view, simply left click and drag on the dividing area between the views. All views stacked together in one area are resized at the same time. To reorder a view in a stacked set of views, left click and drag the view tab to the new location within the view stack. As you drag the view, an outline will show the target location for the view. You can place the view in a new location, or stack it in the same location as other views.
You can undock a view from the profiler window so that the view occupies its own stand-alone window. You may want to do this to take advantage of multiple monitors or to maximum the size of an individual view. To undock a view, left click the view tab and drag it outside of the profiler window.
To dock a view, left click the view tab not the window decoration and drag it into the profiler window. Use the X icon on a view tab to close a view.
To open a view, use the View menu. Profiling options are provided to nvprof through command-line options. Profiling results are displayed in the console after the profiling data is collected, and may also be saved for later viewing by either nvprof or the Visual Profiler. To view the full help page, type nvprof --help. Specify the MPI implementation installed on your machine. Generate event dependency graph for host and device activities and run dependency analysis.
See Dependency Analysis for more information. Change the scope of subsequent --events , --metrics , --query-events and --query-metrics options. See Profiling Scope for more information. Change the scope of subsequent --events , --metrics options. The syntax is as follows:.
Example: --kernels "1:foo:bar:2" will profile any kernel whose name contains "bar" and is the 2nd instance on context 1 and on stream named "foo". Profile all processes launched by the same user who launched this nvprof instance. Note: Only one instance of nvprof can run with this option at the same time.
Under this mode, there's no need to specify an application to run. See Multiprocess Profiling for more information. Profile the application and all child processes launched by it. See Focused Profiling for more information. If enabled, this option can vastly improve kernel replay speed, as save and restore of the mutable state for each kernel pass will be skipped. Specifically, a kernel can malloc and free a buffer in the same launch, but it cannot call an unmatched malloc or an unmatched free.
Specify the source level metrics to be profiled on a certain kernel invocation. Use --devices and --kernels to select a specific kernel invocation. One or more of these may be specified, separated by commas. Note: Use --export-profile to specify an export file. See Source-Disassembly View for more information. See System Profiling for more information. Set an execution timeout in seconds for the CUDA application.
See Timeout and Flush Profile Data for more information. See Unified Memory Profiling for more information.
See OpenACC for more information. See OpenMP for more information. If the environment variable is not set it's an error. See Demangling for more information. Specify the unit of time that will be used in the output. See Adjust Units for more information.
Specify the option or options separated by commas to be traced. See Summary Mode for more information. By default, this option disables the summary output. Make nvprof send all its output to the specified file, or one of the standard channels. The file will be overwritten. If the file doesn't exist, a new one will be created. Note: This is the default.
See Redirecting Output for more information. Summary mode is the default operating mode for nvprof. For each kernel, nvprof outputs the total time of all instances of the kernel or type of memory copy as well as the average, minimum, and maximum time. The time for a kernel is the kernel execution time on the device. If your application uses Dynamic Parallelism, the output will contain one column for the number of host-launched kernels and one for the number of device-launched kernels.
For each kernel or memory copy, detailed information such as kernel parameters, shared memory usage and memory transfer throughput are shown. For host kernel launch, the kernel ID will be shown. For device kernel launch, the kernel ID, parent kernel ID and parent block will be shown. Here's an example:. In some cases this mode can be faster than kernel replay mode if the application allocates large amount of device memory.
To collect all events available on each device, use the option --events all. To collect all metrics available on each device, use the option --metrics all.
By default, event and metric values are aggregated across all units in the GPU. For example, multiprocessor specific events are aggregated across all multiprocessors on the GPU. If --aggregate-mode off is specified, values of each unit are shown.
For example, in the following example, the "branch" event value is shown for each multiprocessor on the GPU:. A timeout in seconds can be provided to nvprof.
The CUDA application being profiled will be killed by nvprof after the timeout. Profiling result collected before the timeout will be shown. Concurrent-kernel profiling is supported, and is turned on by default. To turn the feature off, use the option --concurrent-kernels off.
This forces concurrent kernel executions to be serialized when a CUDA application is run with nvprof. This profiling scope can be limited by the following options. Each string in the angle brackets can be a standard Perl regular expression. Empty string matches any number or character combination. Invocation number n indicates the n th invocation of the kernel. If invocation is a positive number, it's strictly matched against the invocation of the kernel.
Otherwise it's treated as a regular expression. Invocation number is counted separately for each kernel. So for instance will match the 3rd invocation of every kernel. By default, nvprof only profiles the application specified by the command-line argument. It doesn't trace child processes launched by that process. To profile all processes launched by an application, use the --profile-child-processes option. Exit this mode by typing "Ctrl-c". For devices that support system profiling, nvprof can enable low frequency sampling of the power, clock, and thermal behavior of each GPU used by the application.
This feature is turned off by default. To turn on this feature, use --system-profiling on. To see the detail of each sample point, combine the above option with --print-gpu-trace. This feature is enabled by default. This feature can be disabled with --unified-memory-profiling off. To see the detail of each memory transfer while this feature is enabled, use --print-gpu-trace.
On multi-GPU configurations without P2P support between any pair of devices that support Unified Memory, managed memory allocations are placed in zero-copy memory.
In this case Unified Memory profiling is not supported. In this case Unified Memory profiling is supported. By default, nvprof adjusts the time units automatically to get the most precise time values. The --normalized-time-unit options can be used to get fixed time units throughout the results.
For each profiling mode, option --csv can be used to generate output in comma-separated values CSV format. The result can be directly imported to spreadsheet software such as Excel. For each profiling mode, option --export-profile can be used to generate a result file. This file is not human-readable, but can be imported back to nvprof using the option --import-profile , or into the Visual Profiler.
Use option --demangling off to turn this feature off. By default, nvprof sends most of its output to stderr. To redirect the output, use --log-file. This analysis can also be applied to imported profiles. This is the default for nvprof if not disabled using --profile-api-trace none.
The option --print-dependency-analysis-trace can be specified to change from a summary output to a trace output, showing computed metrics such as time on the critical path per function instance rather than per function type. The graph can be presented in different "views" top-down , bottom-up or flat , allowing the user to analyze the sampling data from different perspectives. For instance, the bottom-up view shown above can be useful in identifying the "hot" functions in which the application is spending most of its time.
The top-down view gives a break-down of the application execution time, starting from the main function, allowing you to find "call paths" which are executed frequently. By default the CPU sampling feature is disabled. To enable it, use the option --cpu-profiling on. The next section describes all the options controlling the CPU sampling behavior. Table 1 contains OpenACC profiling related command-line options of nvprof. In "exclusive" mode, those two durations are subtracted. On 64bit Linux platforms, nvprof supports recording OpenMP activities.
PGI version Table 2 contains OpenMP profiling related command-line options of nvprof. Remote profiling is the process of collecting profile data from a remote system that is different than the host system at which that profile data will be viewed and analyzed.
There are two ways to perform remote profiling. You can profile your remote application directly from nsight or the Visual Profiler. Or you can use nvprof to collect the profile data on the remote system and then use nvvp on the host system to view and analyze the data. This section describes how to perform remote profiling by using the remote capabilities of nsight and the Visual Profiler.
Nsight Eclipse Edition supports full remote development including remote building, debugging, and profiling. Using these capabilities you can create a project and launch configuration that allows you to remotely profile your application. See the Nsight Eclipse Edition documentation for more information.
The Visual Profiler also enables remote profiling. As shown in the following figure, when creating a new session or editing an existing session you can specify that the application being profiled resides on a remote system. Once you have configured your session to use a remote application, you can perform all profiler functions in the same way as you would with a local application, including timeline generation, guided analysis, and event and metric collection.
The host and remote systems may run different operating systems or have different CPU architectures. Only a remote system running Linux is supported. The remote system must be accessible via SSH.
In certain remote profiling setups, the machine running the actual CUDA program is not accessible from the machine running the Visual Profiler.
These two machines are connected via an intermediate machine, which we refer to as the login node. The host machine is the one which is running the Visual Profiler. The login node is where the one-hop profiling script will run. We only need ssh, scp and perl on this machine. The compute node is where the actual CUDA application will run and profiled.
The profiling data generated will be copied over to the login node, so that it can be used by the Visual Profiler on the host. To configure one-hop profiling, you need to do the following one-time setup:. Once this setup is complete, you can profile the application as you would on any remote machine.
Copying all data to and from the login and compute nodes happens transparently and automatically. This section describes how to perform remote profiling by running nvprof manually on the remote system and then importing the collected profile data into the Visual Profiler. There are three common remote profiling use cases that can be addressed by using nvprof and the Visual Profiler.
The profile data will be collected in the metrics. You should copy these files back to the host system and then import it into the Visual Profiler as described in the next section. The collected profile data is viewed and analyzed by importing it into the Visual Profiler on the host system.
See Import Session for more information about importing. To view collected timeline data, the timeline. If metric or event data was also collected for the application, the corresponding metrics. To view collected analysis data for an individual kernel, the analysis. The analysis. The timeline will show just the individual kernel that we specified during data collection.
After importing, the guided analysis system can be used to explore the optimization opportunities for the kernel. On Windows the library. NVTX functions with these suffixes exist in multiple variants, performing the same core functionality with different parameter encodings.
Some of the NVTX functions are defined to have return values. For example, the nvtxRangeStart function returns a unique range identifier and nvtxRangePush function outputs the current stack level.
It is recommended not to use the returned values as part of conditional code in the instrumented application. The returned values can differ between various implementations of the NVTX library and, consequently, having added dependencies on the return values might work with one tool, but may fail with another.
Markers are used to describe events that occur at a specific time during the execution of an application, while ranges detail the time span in which they occur. This information is presented alongside all of the other captured data, which makes it easier to understand the collected information. All markers and ranges are identified by a message string. The Ex version of the marker and range APIs also allows category, color, and payload attributes to be associated with the event using the event attributes structure.
A marker is used to describe an instantaneous event. A marker can contain a text message or specify additional information using the event attributes structure. Use nvtxMarkEx to create a marker containing additional attributes specified by the event attribute structure.
The start of a range can occur on a different thread than the end of the range. A range can contain a text message or specify additional information using the event attributes structure. Use nvtxRangeStartEx to create a range containing additional attributes specified by the event attribute structure. The start of a range must occur on the same thread as the end of the range.
Use nvtxRangePushEx to create a range containing additional attributes specified by the event attribute structure. Each push function returns the zero-based depth of the range being started. The nvtxRangePop function is used to end the most recently pushed range for the thread.
If the pop does not have a matching push, a negative value is returned to indicate an error. The layout of the structure is defined by a specific version of NVTX and can change between different versions of the Tools Extension library. Markers and ranges can use attributes to provide additional information for an event or to guide the tool's visualization of the data. Each of the attributes is optional and if left unspecified, the attributes fall back to a default value.
It is recommended that the caller use the following method to initialize the event attributes structure. The NVTX synchronization module provides functions to support tracking additional synchronization details of the target application. Naming OS synchronization primitives may allow users to better understand the data collected by traced synchronization APIs. Additionally, annotating a user-defined synchronization object can allow the user to tell the tools when the user is building their own synchronization system that does not rely on the OS to provide behaviors, and instead uses techniques like atomic operations and spinlocks.
Domains enable developers to scope annotations. By default all events and annotations are in the default domain. Additional domains can be registered. This allows developers to scope markers and ranges to avoid conflicts. The function nvtxDomainDestroy marks the end of the domain.
Destroying a domain unregisters and destroys all objects associated with it such as registered strings, resource objects, named categories, and started ranges.
The following example shows how the current host OS thread can be named. The following example shows how a CUDA device and stream can be named. The following example shows how a CUDA device, context and stream can be named.
Registered strings are intended to increase performance by lowering instrumentation overhead. String may be registered once and the handle may be passed in place of a string where an the APIs may allow.
The nvtxDomainRegisterStringA function is used to register a string. If you have either of these installed on your system, you can use the --annotate-mpi option and specify your installed MPI implementation. Only synchronous MPI calls are annotated using this built-in option.
For example if you have OpenMPI installed, you can annotate your application using the command:. You can create this annotation library conveniently using the documentation and open-source scripts located here. To use nvprof to collect the profiles of the individual MPI processes, you must tell nvprof to send its output to unique files.
In CUDA 5. Below is example run using Open MPI. Alternatively, one can make use of the new feature to turn on profiling on the nodes of interest using the --profile-all-processes argument to nvprof. To do this, you first log into the node you want to profile and start up nvprof there. Any processes that run on the node where the --profile-all-processes is running will automatically get profiled.
The profiling data will be written to the output files. Starting CUDA 7. This feature is useful to spot resources associated with a specific rank when user imports multiple files into the same time-line in the Visual Profiler.
Details about what types of additional arguments to use with nvprof can be found in the Multiprocess Profiling and Redirecting Output section. Timeline profiling can be done for all MPS clients on the same server.
Event or metric profiling results in serialization - only one MPS client will execute at a time. Select "Profile all processes" option from drop down, press "Next" and then "Finish". Import the nvprof generated data files for each process using the multi-process import option.
Refer the Import Multi-Process Session section. The figure below shows the MPS timeline view for three processes. Note that the Compute and kernel timeline row shows three kernels overlapping. The dependency analysis feature enables optimization of the program runtime and concurrency of applications utilizing multiple CPU threads and CUDA streams. It allows to compute the critical path of a specific execution, detect waiting time and inspect dependencies between functions executing in different threads or streams.
The dependency analysis in nvprof and the Visual Profiler is based on execution traces of applications. Typical dependencies modelled in this graph would be that a CUDA kernel can not start before its respective launch API call or that a blocking CUDA stream synchronization call can not return before all previously enqueued work in this stream has been completed. From this dependency graph and the API model s , wait states can be computed.
A wait state is the duration for which an activity such as an API function call is blocked waiting on an event in another thread or stream. Knowledge about where wait states occur and how long functions are blocked is helpful to identify optimization opportunities for more high-level concurrency in the application. In addition to individual wait states, the critical path through the captured event graph enables to pinpoint those function calls, kernel and memory copies that are responsible for the total application runtime.
The critical path is the longest path through an event graph that does not contain wait states, i. Waiting time is an inidicator for load-imbalances between execution streams. Instead of waiting immediately, one should attempt to overlap the kernel executions with concurrent CPU work with a similar runtime, thereby reducing the time that any computing device CPU or GPU is blocked.
Activities with a high time on the critical path have a high direct impact on the application runtime. Since no execution stream is waiting on this kernel to finish, reducing its duration will likely not improve the overall application runtime.
Dependency analysis is available in Visual Profiler and nvprof. See section Dependency Analysis on how to use this feature in nvprof. The dependency and wait time analysis between different threads and CUDA streams only takes into account execution dependencies stated in the respective supported API contracts.
This especially does not include synchronization as a result of resource contention. For example, asynchronous memory copies enqueued into independent CUDA streams will not be marked dependent even if the concrete GPU has only a single copy engine.
Furthermore, the analysis does not account for synchronization using a not-supported API. But, being a lazy perfectionist, I wondered if there's a way around all of that! Hi Dominique There have been issues in the past with different languages and I can't recall what the problem was. If it's just the 'labels' for the days and the months you might be ok to change them, but there might be issues with some of the formulas in the Excel file when it comes to working out dates etc.
The labels in the Word source file for the days of the week that aren't merge fields should be ok to change in to your own language. Hi Thank you very much for this service.
The Personal on Personal Paper file week on two pages has the wrong dates. Could you fix this for me please? I tried the source file but I failed. Can you please confirm which number. The dates look fine when I checked them. Personal on Personal Paper number 5. When I open the file I only get 54 pages, which should be Thank you very much for looking into it.
Please download a fresh copy and check the dates through the document continue through to Jan Dear Steve, Thank you for the pages! I have been using them for 3 years. In the pages of not all of the dates:Personal on Personal Paper - Week on two pages. If it is possible, correct, please. Sincerely, Galina. If it is number 5,I updated the files this week. Please download a frsh copy and let me know of there is still an issue. Is there a way to save the planner pages as excel instead of.
No because we use mail merge to create them and that wouldn't work for Excel, plus controlling the final format isn't so easy to do on Excel.
Why do you want them in Excel and not Word? Is there a way to save the calendar pages as. I'm trying to print "Week on two pages Enhanced TM" but keep am having difficulties : Everything is printing great except the date and little monthly calendar at the top of the RIGHT hand side. I have downloaded and tried printing all the various Word versions with the same result.
Any help or advise greatly appreciated please!! Hi Can you email me a photo of the printed page and some more details of your set up so I can try to work out what might be going wrong. Hi Steve, thank you for all your work! I had a question about the printing of the personal size pages on letter size paper. I tried about 8 times to get the pages of the pdf to print back to back but was unable to get them to do that. I'm thinking maybe they're meant to be single pages?
It's the pdf that has all the dates of the month on lines on he left and notes and goals on the right. I've printed double sided PDFs on my one side laser printer plenty of time before but I can't figure this one out. Any hints would be even more appreciated. Thank you! Have you tried using the Word version?
I will try and check that the PDF is set for Letter size although that's not going to be easy to do because I don't have access to letter size paper.
But I know that the Word file is set to letter size. No, I'm sorry, I don't have word. Might one of the work with like libre office work? I'm on an iPad but I can go to my computer. Thank you. We have Pocket on A4 inserts in the list near to the bottom of the list of inserts. You print them out then cut the pages out to pocket size then punch them. It's not easy printing on Pocket size paper hence printing on to A4.
Were can I get pass word sheet made up for classic A5 and address pages thanks. Amazon or Filofax just order one. Or look in the list above for something similar. Hi, Personal on A4 month on one page alignment is off slightly. Tried duplex and manual in word and pdf. All other inserts print fine using A4 for personal, pocket and mini so doubt it's the printer.
Regards, Pete. Hi Peter When you say the page alignment is off slightly, which side of the page? The front side or the reverse side? Chaps, This is really great. I am a 34 year long Time Manager 6-hole! I think you might have saved my bacon! Two questions 1 Where should I be looking to find the classic TM 'day to gpage' pages and the utterly brilliant 'month per page' diary?
David London, England. Hello David Contact me by email with some photos of the inserts you are looking for with the size details as well and we will see what we can create in a similar format. Email address on the About page here. I love your insert but unfortunately I can't always use them : is there a way to make the weeks start on Sunday instead? Creating Sunday Start is possible, but you will need to modify the files yourself. Diaries Please download, adapt, use, share, but do not charge or use commercially.
For previous years including click here For Future years click here. Click on the name to see a preview of the insert, note that the date doesn't reflect the files, the previews are not updated every year. The print ready Word or PDF files are a full 12 months for each year listed at the top of the column. The source files are in the Word and Excel columns, you will need both files for each insert.
In the Excel files you will find a Year sheet, and other worksheets. To change the year enter the year in appropriate cell, note that the source files don't change from year to year. When carrying out the merge in Word merge the data from the 'Merge Sheet' listed in the final column.
You can find them all here and they are all free. They are offered for download and use under a Creative Commons Licence. That means you can freely use and adapt them other than commercially. You can share these templates with other people, but you must not charge for them. You need to decide whether you just want to download prepared layout sets for the full year, or whether you want to roll up your sleeves and adapt our templates to your own needs.
Some of the things you can do when you have the source files are: Make and print a diary for any year - past, present or future Adjust the space given to different days or other template elements Add your name or other details to each page Make a diary for part of a year, or for any period you want Change the language of month and day name Follow the links below to the blog posts where the files are to be found.
0コメント